Open Source Adventures: Episode 79: Exploring Crystal Regular Expression API
In the previous episode we've taken a look at Ruby Regular Expression API. I want to try a few more languages, and the most obvious one to start with is Crystal.
A lot of solutions work exactly like in Ruby, but some of the differences are interesting.
Test case
Crystal doesn't have %W
, which is one of my favorite Ruby features, but in this case its non-interpolating and much less awesome relative %w
will do.
Here's the test case:
%w[
2015-05-25
2016/06/26
27/07/2017
].each do |s|
p parse_date(s)
end
And the expected result:
[2015, 5, 25]
[2016, 6, 26]
[2017, 7, 27]
Solution 1
def parse_date(s)
case s
when %r[(\d\d\d\d)-(\d\d)-(\d\d)]
[$1.to_i, $2.to_i, $3.to_i]
when %r[(\d\d\d\d)/(\d\d)/(\d\d)]
[$1.to_i, $2.to_i, $3.to_i]
when %r[(\d\d)/(\d\d)/(\d\d\d\d)]
[$3.to_i, $2.to_i, $1.to_i]
end
end
The most straightforward solution works exactly as it did in Ruby with no changes.
Solution 2
#!/usr/bin/env crystal
def parse_date(s)
case s
when %r[(\d\d\d\d)-(\d\d)-(\d\d)], %r[(\d\d\d\d)/(\d\d)/(\d\d)]
[$1.to_i, $2.to_i, $3.to_i]
when %r[(\d\d)/(\d\d)/(\d\d\d\d)]
[$3.to_i, $2.to_i, $1.to_i]
end
end
Grouping when
options works just like in Ruby.
Solution 3
def parse_date(s)
case s
when %r[(\d\d\d\d)([/-])(\d\d)\2(\d\d)]
[$1.to_i, $3.to_i, $4.to_i]
when %r[(\d\d)/(\d\d)/(\d\d\d\d)]
[$3.to_i, $2.to_i, $1.to_i]
end
end
Back-references also work just like in Ruby.
Solution 4
Now this does not work:
def parse_date(s)
case s
when %r[(\d\d\d\d)-(\d\d)-(\d\d)|(\d\d\d\d)/(\d\d)/(\d\d)]
[($1 || $4).to_i, ($2 || $5).to_i, ($3 || $6).to_i]
when %r[(\d\d)/(\d\d)/(\d\d\d\d)]
[$3.to_i, $2.to_i, $1.to_i]
end
end
The reason is that in Ruby $1
can be either a String
or a nil
. In Crystal $1
is a String
, so if it didn't match, it's an error to access it.
Crystal also has nil
able equivalents $1?
, $2?
etc. Notice that to make the whole expression not nilable, we don't put ?
on the last one:
def parse_date(s)
case s
when %r[(\d\d\d\d)-(\d\d)-(\d\d)|(\d\d\d\d)/(\d\d)/(\d\d)]
[($1? || $4).to_i, ($2? || $5).to_i, ($3? || $6).to_i]
when %r[(\d\d)/(\d\d)/(\d\d\d\d)]
[$3.to_i, $2.to_i, $1.to_i]
end
end
Solution 5
def parse_date(s)
case s
when %r[(\d\d\d\d)-(\d\d)-(\d\d)|(\d\d\d\d)/(\d\d)/(\d\d)|(\d\d)/(\d\d)/(\d\d\d\d)]
[($1? || $4? || $9).to_i, ($2? || $5? || $8).to_i, ($3? || $6? || $7).to_i]
end
end
Knowing what we know, we can use the same trick, rewriting ($1 || $4 || $9)
into ($1? || $4? || $9)
and so on.
Solution 6
def parse_date(s)
case s
when
%r[(?<year>\d\d\d\d)-(?<month>\d\d)-(?<day>\d\d)],
%r[(?<year>\d\d\d\d)/(?<month>\d\d)/(?<day>\d\d)],
%r[(?<day>\d\d)/(?<month>\d\d)/(?<year>\d\d\d\d)]
[$~["year"].to_i, $~["month"].to_i, $~["day"].to_i]
end
end
Using named captures works identically to the Ruby version.
Solution 7
def parse_date(s)
case s
when %r[(?<year>\d\d\d\d)-(?<month>\d\d)-(?<day>\d\d)|(?<year>\d\d\d\d)/(?<month>\d\d)/(?<day>\d\d)|(?<day>\d\d)/(?<month>\d\d)/(?<year>\d\d\d\d)]
[$~["year"].to_i, $~["month"].to_i, $~["day"].to_i]
end
end
Having capture groups with the same name works just like in Ruby without changes.
Solution 8
def parse_date(s)
case s
when %r[
(?<year>\d\d\d\d)-(?<month>\d\d)-(?<day>\d\d) |
(?<year>\d\d\d\d)/(?<month>\d\d)/(?<day>\d\d) |
(?<day>\d\d)/(?<month>\d\d)/(?<year>\d\d\d\d)
]x
[$~["year"].to_i, $~["month"].to_i, $~["day"].to_i]
end
end
And so does //x
flag - everything just works.
Solution 9
def parse_date(s)
if %r[
(?<year>\d\d\d\d)-(?<month>\d\d)-(?<day>\d\d) |
(?<year>\d\d\d\d)/(?<month>\d\d)/(?<day>\d\d) |
(?<day>\d\d)/(?<month>\d\d)/(?<year>\d\d\d\d)
]x =~ s
[year.to_i, month.to_i, day.to_i]
end
end
This on the other hand is completely unsupported - the only side effect of regular expression match is overriding $~
variable ($1
is just alias for $[1]?
etc.). Regular expression match can't override other local variables.
I'm not really comfortable with this Ruby feature, so it's not surprising it didn't find its way here.
Story so far
Everything just worked with no or minimal changes. This is my usual experience with Crystal. Things just work most of the time.
Coming next
In the next episode we'll see how other languages handle this problem.