Open Source Adventures: Episode 75: Issues with Crystal Char type

While I was writing puzzle game solvers in Z3 Crystal, one issue I was running into more than anything else was existence of Char type.

Most modern languages don't have character type. "foo"[0] in most languages is a "f" - a String that just so happens to be one character long.

Having a separate Char hugely complicates APIs. I can see why it could be a useful thing for performance, but the complexity cost is real.

Why character type is problematic in general

The main reason is that in Unicode world, a lot of operations you might intuitively thing would work on characters, actually don't. But they work on strings just fine.

Just one such operation out of many is upper-casing something. Here's Crystal:

puts "ß".upcase # outputs correctly uppercased SS
puts 'ß'.upcase # outputs lowercase ß

Ouch!

There's a lot of situations where uppercasing a length 1 string results in a string longer than 1.

So a language that has separate character types has a choice - either don't support any such operations on characters (which would be a huge pain), or implement them not quite correctly (like Crystal does).

Crystal specific issues

In Crystal I was writing a lot of code like c == "." or c =~ /[0-9]/.

The problem here is that they simply return false or nil, and do not complain any type issues. So I have code that looks perfectly fine, and it would run perfectly fine in Ruby and most other languages, and for which type checker isn't complaining in any way, and yet it is statically wrong.

So here are some questions:

Should Crystal have exposed Char in the first place? If I was designing a language, I wouldn't add such type, or just have an internal one not exposed in regular APIs, but obviously that would be a huge change, so I doubt this would even be a consideration at this point.

Should "a" == 'a'? Sure, they're different types, but 420 == 420.0 is true even though they're different types too, so it's not inherently impossible. I'm not sure what would be the implications here.

Should Char =~ Regexp match it as if it was a length one String? I'd say probably yes to this one, at least I'm not seeing a big downside, and it has very obvious meaning, and it's difficult to express it otherwise.

Should == or =~ with mismatching types pass type check? Obviously yes due to union types. If x is String | Nil, then x == nil which means "foo" == nil must be valid code. And same argument for =~.

Should == or =~ with types that cannot match produce a warning? Now here's an interesting question. If we statically know that a == b or a =~ b will be false/nil due to types of a and b, the odds are good that it might be programmer error, not intended code. And it doesn't seem like a terribly complicated analysis to do. So should Crystal warn in such case? Like with all warnings, that's mainly a question of false positive rate, as overly aggressive linters are a huge pain.

Coming next

OK, that's enough Crystal for now. In the next episode we'll try another technology as promised.