100 Languages Speedrun: Episode 52: Perl

Perl is a highly influential "kitchen sink" language. Languages like that see an idea that might potentially be useful or just cool, and just add it to the language because why not.

Other languages with similar design philosophy (or one might say lack of design philosophy) are C++ and Scala. And it's not a coincidence that kitchen sink languages place really high on the charts of most hated languages of all times. They can be quite productive to write code in, but reading someone else's code, not so much.

I very much support the "kitchen sink" language design, including new ones like Raku. Among all those features there's usually a few gems that earn their place in more mainstream languages. For example Perl spearheaded first class regular expression support, and that's now simply common sense. Or even such simple things like having hashes (or "dictionaries") and hash literal syntax, or string interpolation (somewhat limited in Perl). These features proved themselves and are now everywhere.

To celebrate Perl's weirdness, this episode will be mainly about the weird parts of Perl, that never got far.

I won't get too much into the WATs. Perl is the WAT factory like no other, and it gets very well deserved criticism for that.

FizzBuzz

But first, the FizzBuzz.

#!/usr/bin/perl

# $\ specifies what gets printed at the end of print automatically
$\ = "\n";

# If we don't specify any variable, Perl will use topic variable $_
for (1..100) {
  if ($_ % 3 == 0 && $_ % 5 == 0) {
    print "FizzBuzz"
  } elsif ($_ % 3 == 0) {
    print "Fizz"
  } elsif ($_ % 5 == 0) {
    print "Buzz"
  } else {
    # print also defaults to printing topic variable
    # (followed by $\ as always)
    print
  }
}

This already demonstrates a lot of Perl's unusual features:

  • variables have sigils (prefixes) - $ means scalar, @ means list, % means hash (dictionary), and there are a few less common ones.
  • special variables like $\ can control a lot of Perl's behavior
  • $_ topic variable to save you some typing - this feature is seen to limited degree in many languages
  • ranges 1..100 go from 1 to 100 as they ought to, without the +1 weirdness. You can do 1...100 in Perl too, but that just means the same thing as 1..100. In Ruby 1...100 is same as 1..99.

Say Hello

Let's write code that does this:

$ ./hello.pl
Hello Alice Smith
Hello ALICE SMITH! 🎉🎉🎉
Hello Alice Smith
#!/usr/bin/perl

$\ = "\n";

# Say hello to %person
sub say_hello {
  # Interpolation only works with variables and
  # a few expressions like $variable{key}
  print "Hello $person{name} $person{surname}";
}

# Hash variable
%person = (
  name => "Alice",
  surname => "Smith",
);

say_hello;

# Be more excited this time!
# local changes are reverted once we exit the block
do {
  local $\ = "! 🎉🎉🎉\n";
  local $person{name} = uc$person{name};
  local $person{surname} = uc$person{surname};
  say_hello;
};

# Back to the usual
say_hello;

There's a lot going on here!

%person is a hash variable describing a person. However its elements are not hashes, they're just scalars, so the name of the person is $person{name} and the surname is $person{surname}. People find this sigil system very nonintuitive, and in Raku it switched to %person{"name"}.

One quite unusual feature in Perl is "dynamic scoping" - we can define something local to a block - it works sort of like a global variable, but it gets reverted to what it was before once the block ends.

This way we can change line ending $\ (and honestly without dynamic scoping, those globals changing stuff all over the place would be really bad). But we can also change individual elements of a hash, or current ENV, or many other things.

Perl has also usual local variables with my keyword. And a few other kinds, obviously.

Contexts

The $, @, and so on are not some tiny things. They're actually core to how Perl works. Everything in Perl is in "scalar context" or "list context" (or one of the other contexts).

Here's an example:

#!/usr/bin/perl

$first_person = <STDIN>;
@other_people = <STDIN>;

chomp $first_person;
chomp @other_people;

print "Special welcome to $first_person!\n";
print "Also welcome to ", join(", ", @other_people), "!\n";

And here's what it does:

$ ./contexts.pl
Alice
Bob
Charlie
Dave
Special welcome to Alice!
Also welcome to Bob, Charlie, Dave!

<STDIN> reads lines from STDIN. Annoyingly they always come with the extra \n and there are no special variables to chop that off, that's such a weird omission. But <STDIN> does a different thing depending on being in scalar context or list context.

When we use it in scalar context $first_person = <STDIN> - it reads one line. When we use in in list context @other_people = <STDIN> - it reads all the remaining lines.

A lot of APIs have a lot of pairs of functions getOneX and getManyXs. Perl can simplify this with some context awareness.

Something vaguely similar was done by jQuery where $(selector) could be used to return one thing or many, while modern browser APIs turned that into .querySelector and .querySelectorAll, but jQuery was based on completely different principles.

If you want your function to support contexts you can check wantarray keyword, which return true for list context, false for scalar context, and undef for void context when value is not used. Perl documentation also lists two other contexts, because things are always more complicated than they first seem in Perl.

Golf

Code Golf is a competition to write a program to do a given task in the fewest characters. Before custom languages for golfing got created, it was dominated by Perl, Ruby, and occasionally APL.

Here's such "golfed" code for FizzBuzz, from a Code Golf site:

print+(Fizz)[$_%3].(Buzz)[$_%5]||$_,$/for 1..100

For some explanations:

  • words without quotes are treated as strings if there's no better interpretation, so (Fizz) is a list of one string ("Fizz").
  • $_%3 return 0, 1, or 2 depending on remainder of $_ modulo 3
  • so (Fizz)[$_%3] returns "Fizz" or undef
  • and likewise (Buzz)[$_%5] returns "Buzz" or undef
  • . is string concatenation and undefined values become empty strings, so (Fizz)[$_%3].(Buzz)[$_%5] returns "Fizz", "Buzz", "FizzBuzz", or ""
  • stuff||$_ means stuff if it's true, otherwise $_. As empty string is false in Perl, it gets us "Fizz", "Buzz", "FizzBuzz", or $_, as by FizzBuzz rules
  • that extra + is a precedence hack to save on some parentheses
  • $/ is \n by default
  • so we have print(fizz_buzz_stuff, $/) for 1..100 or for (1..100) { print(fizz_buzz_stuff, "\n") }

As far as golfs go, it's not too bad.

Weirdly Ruby is about equally good for golfing at Perl, without any of the readability issues.

Rename

I still use Perl for one thing on a regular basis, and that's the rename script, which used to be bundled with most Linux distributions, and which I included in my unix-utilities package.

rename takes a Perl script as argument, and then a list of file names. Then it runs that Perl script, with $_ set to the file name. If it changed, it then renames the file.

It of course does sensible things, like dry run mode, verbose mode, checking that it won't accidentally overwrite things, and so on.

Some random examples of rename:

Replace all spaces by underscores:

$ rename 's/ /_/g' *.txt

Flatten nested directory structure by one level:

$ rename 's!/! - ! */*'

Rename all .txt to .md:

$ rename 's/\.txt$/.md/' *.txt

And so on. Most of the time a single regexp replace will do, but sometimes you can run real code there. And for such cases rename --dry-run is amazing.

Autovivification

Normally if you want to build up something iteratively, you need to initialize it first to an empty value. Not in Perl. Because each variable knows if it's a scalar, array, or hash; and each operation knows if it's a string or number operation, Perl can initialize things automatically.

For example in this script:

#!/usr/bin/perl

while(<>) {
  $counts{lc$_}++ for /\w+/g;
}

my @top = sort { $counts{$b} <=> $counts{$a} || $a cmp $b } keys %counts;
for (@top[0..9]) {
  print "$_: $counts{$_}\n";
}

And we can see top ten words in the KJV version of the Bible:

$ curl -s https://www.gutenberg.org/cache/epub/10/pg10.txt | ./wordcount.pl
the: 64305
and: 51762
of: 34843
to: 13680
that: 12927
in: 12727
he: 10422
shall: 9840
for: 8997
unto: 8997

There are so many interesting things going on here:

  • autovivification with $counts{lc$_}++ - we didn't have to do %counts={} and $counts{lc$_} ||= 0 like we would in most other languages
  • in Perl scalars work as strings or numbers depending on context, which makes things awkward for sorting. Inside sort{ } $a and $b are elements being compared. In this case we compare values numerically with <=> (which returns -1, 0, or +1), and then (|| only runs right side if left is false, and 0 being equal is false) compare keys as strings with cmp (which returns -1, 0, or +1). It can work, but I much prefer Ruby version counts.sort_by{|k,v| [-v, k]}.
  • $top[...] is one element of @top, but @top[...] is a list of elements, corresponding to list of indexes we pass.

Function Prototypes

Perl is really committed to not having to do parentheses. For example you can declare that a function takes exactly one scalar argument with ($). Take a look:

#!/usr/bin/perl

sub titleize ($) {
  my ($word) = (@_);
  $word = lc$word;
  $word =~ s/\b./uc$&/eg;
  $word;
}

print "Hello ", titleize "alice SMITH", "!\n";
$ ./prototypes.pl
Hello Alice Smith!

Thanks to the prototype, Perl knows what you meant was this:

print("Hello ", titleize("alice SMITH"), "!\n");

And not this:

print("Hello ", titleize("alice SMITH", "!\n"));

A lot of Perl builtin functions behave like this, including obviously uc and lc.

This is something even Ruby and Raku do not attempt. Ruby achieves its minimal parentheses count by making such one argument functions into methods you can unambiguously call with .method_name:

print "Hello ", "alice SMITH".titleize, "!\n"

Another things to notice here, is that Perl functions don't have argument lists. They just get @_ as argument list, and it's up to them to unpack them. Very often the first line of every function is my ($arg1, $arg2, @rest) = @_ or such.

Flipping Language Features

Perl has a lot of default like calculations being on floating point numbers, but it's really happy to offer alternatives, which you can select with lexically scoped use feature:

#!/usr/bin/perl

$\="\n";

print 1/7;

{
  use bigrat;
  print 1/7;
};

print 1/7;
$ ./bigrat.pl
0.142857142857143
1/7
0.142857142857143

Until recently it offered completely insane $[ which determined if arrays start from 0, from 1, or from something dumb. That got understandably removed.

Unix Integration

Perl always meant to completely replace Unix shell scripts. It has absolutely phenomenal Unix integration. Here are just some examples:

#!/usr/bin/perl

$os = `uname -ms`;
chomp $os;

{
  local $ENV{LC_ALL} = 'ru_RU';
  $date = `date`;
  chomp $date;
};

print "You're on $os\n";
print "In Russian, date is $date\n";

print "Number of characters in numbers 1 to 1_000_000 is: ";
open(F, "|wc -c");
print F $_ for 1..1_000_000;
close F;
$ ./system.pl
You're on Darwin x86_64
In Russian, date is вторник, 11 января 2022 г. 06:31:54 (GMT)
Number of characters in numbers 1 to 1_000_000 is:  5888896

As you can see:

  • backticks to get output of a simple command
  • you can change ENV by modifyng %ENV - and something other languages don't really provide, you can make those changes scoped so they get restored when you exit the block
  • you can open pipes from or to your program just like you'd open a file - for bidirectional communication you'd need to use a module

Only Ruby and Raku fully endorsed this, and Ruby doesn't have local ENV trick. On the other hand, Perl doesn't have equivalent of Ruby's block-scoped local directory change with Dir.chdir{ ... }. In Perl if you change directory with chdir you need to restore it back manually.

Flip Flop Operator

A flip flop is a pair of condition. When first is true, it turns on the flip flop. When second is true, it turns it off. So there's a bit of hidden state.

Well, let's parse some HTML with regular expressions, and extract all links from a head section of an HTML:

#!/usr/bin/perl

$\="\n";

open F, "curl -s https://en.wikipedia.org/wiki/Perl |";
while (<F>) {
  if (/<head>/ .. /<\/head>/) {
    print $1 for /href="(.*?)"/g;
  }
}
$ ./flipflop.pl
/w/load.php?lang=en&amp;modules=ext.cite.styles%7Cext.pygments%2CwikimediaBadges%7Cext.uls.interlanguage%7Cext.visualEditor.desktopArticleTarget.noscript%7Cjquery.makeCollapsible.styles%7Cskins.vector.styles.legacy%7Cwikibase.client.init&amp;only=styles&amp;skin=vector
/w/load.php?lang=en&amp;modules=site.styles&amp;only=styles&amp;skin=vector
//upload.wikimedia.org
//en.m.wikipedia.org/wiki/Perl
/w/index.php?title=Perl&amp;action=edit
/static/apple-touch/wikipedia.png
/static/favicon/wikipedia.ico
/w/opensearch_desc.php
//en.wikipedia.org/w/api.php?action=rsd
https://creativecommons.org/licenses/by-sa/3.0/
https://en.wikipedia.org/wiki/Perl
//meta.wikimedia.org
//login.wikimedia.org

If we had a lot of HTML documents, the flip flop would keep going on whenever a <head> is matched, and keep going off whenever </head> is matched. It's basically a shortcut notation for saying:

#!/usr/bin/perl

$\="\n";

open F, "curl -s https://en.wikipedia.org/wiki/Perl |";
$in_head = false;
while (<F>) {
  $in_head = 1 if /<head>/;
  $in_head = 0 if /<\/head>/;
  if ($in_head) {
    print $1 for /href="(.*?)"/g;
  }
}

Arguably a flip flop expresses it more cleanly than a state variable and some statements to manage it. Or maybe it doesn't.

This feature has a good amount of controversy behind it. Like many Perl features it found its way into Ruby, but you'd be hard pressed to find it used much in typical Ruby code. At one point Ruby tried to deprecate it with goal of removing it, but it managed a rare feat and got itself un-deprecated.

Should you use Perl?

Probably not.

I tried to show Perl in the best light here, and even that wasn't possible without running into a good number of WATs. If you look at real Perl code, there's a really high WAT rate. At the time it could have been argued the WATs are a price worth paying for expressiveness, and it's still way better than shell scripting, but most languages after Perl took many of Perl's best parts, without the WATs.

Perl has two main spiritual successors, which took different lessons from Perl - Ruby and Raku. Ruby took the "lessons learned" approach, kept the good stuff somewhat selectively, also good stuff from Smalltalk and other languages, and created a thing of beauty. Raku took the opposite "build a better kitchen sink" approach, cleaned up some stuff that clearly wasn't working, and instead piled up a lot of completely new untested ideas, to get a hopefully highly expressive mess. Depending on why you wanted to use Perl, one or the other might be more appealing.

Other modern languages like Python (or even somewhat JavaScript with Node) are not quite doing what Perl was aiming at, but they're generally adequate as replacement for shell scripting, and they have an advantage that you might already know them.

Perl is one of the best languages for code golfing, but Ruby is about equally good at it, while being so much more useful overall. And nowadays golfing-specific languages are also very popular.

Overall if you wanted to use Perl for something, I'd recommend Ruby as a first choice replacement, and one of the other languages I mentioned if that's not quite what you want.

While writing this episode I also had quite a few moments when I thought some Perl-style Ruby feature originated in Perl (notably Dir.chdir and %W), but it turned out that it was just Ruby extrapolating from the good parts of Perl. The best Perl of today is no longer Perl itself.

Code

All code examples for the series will be in this repository.

Code for the Perl episode is available here.