100 Languages Speedrun: Episode 45: M4
First, a brief history lesson.
Preprocessors used to be a thing, most notable of them being the CPP (C PreProcessor) used by C, C++, and occasionally a few other languages. It started when there was a fairly simple language like the old style C, and people really wished it provided more functionality, like constants, and including one file in another. But instead of adding all that to the language itself, they'd pass the source code through some program first like CPP (C PreProcessor), and only then hand the result over to the compiler.
This result sort of works, but it's really quite terrible - imagine debugging anything if you cannot see the code the language seen, error lines are completely mysterious, preprocessor doesn't know anything about the language, language doesn't know anything about preprocessor directives, it's a total mess.
C has been walking back from this mess. Step by step, the "compiler" got extended to get hints from the "preprocessor", the language got extended to support many things (enum
and const
notably), the "preprocessor" has been integrated with the compiler, so you can actually get meaningful error messages, and finally the whole "including file" was replaced by "precompiled headers" so the end result is a hybrid CPP/C or CPP/C++ language that doesn't really have a "preprocessor" anymore, but does its best to pretend it does.
Nowadays if you need to extend the language, you instead use something like JavaScript Babel, which has full language of language it's translating syntax, so you never get syntax errors from the final language itself - Babel would get it for you. And source maps are used so you get correct line numbers in runtime messages as well. This still isn't perfect, as in-browser debugger will show you the translated code, but it's so much better than having a bunch of regular expressions translating =>
to function() { ... }
and such.
Lesson learned, preprocessors are bad, don't use them.
Anyway, some people took a very different lesson out of the CPP mess, and decided to instead write a better preprocessor. That's how M4 came to be.
Hello, World!
You probably have m4 already installed, as it's used by some abominations like GNU autoconf.
Let's start with a Hello, World!
dnl Hello, World! in M4
define(`hello', `"Hello, World!"')dnl
hello
We can now preprocess our text:
$ m4 < hello.m4
Hello, World!
dnl
is like comment - it skips the rest of the linedefine(...)
defines a macro - in this case it's a very simple macro- even though we have
define(...)
directive on line two, m4 is not thinking in terms of lines, so it would print everything after the closing)
, so we need to end every definition with an uglydnl
, without spaces in between. - notice unusual quoting syntax with opening backtick and closing single quote - this allows quotes to be nested
Macro arguments
We can pass string arguments to macros, they'll be available as $1
, $2
, etc.
define(`hello', `Hello, $1!')dnl
hello(Alice)
$ m4 <name.m4
Hello, Alice!
Math
M4 has very few builtin macros. It can do basic integer math, eval(expression)
returns the result. It doesn't do floating point numbers:
define(`addexample',`$1 + $2 = eval($1+$2)')dnl
addexample(350, 70)
addexample(19, 50)
$ m4 <math.m4
350 + 70 = 420
19 + 50 = 69
Odd Even
M4 can do simple if/else login. ifelse(A,B,THEN,ELSE)
will check if A is same string as B, and if so, it will return THEN, otherwise it will return ELSE. You can also add more arguments to create an if/elsif/elsif/else chains.
define(`oddeven',`ifelse(eval($1%2),0,`$1 is even',`$1 is odd')')dnl
oddeven(69)
oddeven(420)
$ m4 <oddeven.m4
69 is odd
420 is even
FizzBuzz
There are no loops in M4, so we do the usual recursion. There's a lot of fiddling to get the newlines right:
define(`fizzbuzz',`ifelse(eval($1%15),0,`FizzBuzz
',eval($1%5),0,`Buzz
',eval($1%3),0,`Fizz
',`$1
')')dnl
define(`fizzbuzzloop',`ifelse(eval($1<=$2),1,`fizzbuzz($1)fizzbuzzloop(eval($1+1),$2)')')dnl
fizzbuzzloop(1,100)dnl
M4 documentation provides generic forloop(var, from, to, statement)
but it actually has a lot more complex code so it can define var
to be available in the statement
.
You shouldn't be too surprised by what it does:
$ m4 <fizzbuzz.m4
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
...
Buzz
Fizz
97
98
Fizz
Buzz
Numbered list
Let's try to use M4 to handle list numbering automatically for us:
define(`startlist',`define(`listcounter',1)')dnl
define(`nextlistcounter',`define(`listcounter',eval(1+listcounter))')dnl)
define(`item', `* listcounter. $1 nextlistcounter')dnl
Most popular animals:
startlist
item(Cats)
item(Dogs)
item(Fish for some reason, boring)
item(Birds)
item(Rabbits)
$ m4 <list.m4
Most popular animals:
* 1. Cats
* 2. Dogs
* 3. Fish for some reason
* 4. Birds
* 5. Rabbits
It sort of works, but it's define
s doing define
s, and again, spacing is likely not what you'd like it to be - like extra space at the end of each list item line. M4 is probably a lot more sensible in places where you really don't care about all that extra spacing. It's sort of possible to control spacing, but it really increases complexity of the code.
Footnotes
To get output out of order, M4 has divert
functionality. divert(number)
diverts the output to temporary buffer number
. divert
with no arguments resumes normal output. Then you can call undivert(number)
to get it all back.
define(`footnotecounter',`1')dnl
define(`nextfootnotecounter',`define(`footnotecounter',eval(1+footnotecounter))')dnl)
define(`footnote',`[footnotecounter]divert(1)[footnotecounter] $1 nextfootnotecounter
divert')dnl
define(`footnotes',`
Footnotes:
undivert')dnl
Preprocessors footnote("like CPP or M4") are terrible for programming footnote("or pretty much anything else").
footnotes
Which outputs:
m4 <footnotes.m4
Preprocessors [1] are terrible for programming [2].
Footnotes:
[1] "like CPP or M4"
[2] "or pretty much anything else"
All non-empty diversions are automatically printed, in order of their numbers, unless they've been undivert
ed or discarded before. The divert
system is probably the most clever part of M4.
Running system commands
This is perhaps not something you'd expect from a preprocessor, but M4 can run any system command. This goes against common security assumptions. Running an untrusted program is obviously dangerous, but most people would assume that compiling or preprocessing untrusted programs (or in case of M4, just some random text) is fine. Well, not with M4:
define(command,`$ $1
esyscmd($1)')dnl
command(`ping -c 3 8.8.8.8')dnl
$ m4 <cmd.m4
$ ping -c 3 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=117 time=19.144 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=8.400 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=8.961 ms
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 8.400/12.168/19.144/4.938 ms
File includes
Obviously M4 handles file includes as well. Both text and definitions there will be included, just as if the text was copypasted.
include(name.m4)dnl
hello(Bob)
hello(Carol)
hello(Dave)
$ m4 <include.m4
Hello, Alice!
Hello, Bob!
Hello, Carol!
Hello, Dave!
Should you use M4?
As preprocessor for a programming language? Definitely no. For other things? Also no.
Preprocessors for programming are inherently a terrible idea, and if a language needs specific feature, it just needs to get that feature. If it absolutely cannot, you should use a language-aware tool.
For other things, especially if you really don't care about spaces (as handling spaces correctly double the complexity of M4 code), it's tempting to use a preprocessor like CPP or M4. Every single time it was done, the result was a total mess. M4 is a very weak language - as you can see from how nasty the code for even those simple things was, so you could have slightly better results with a better preprocessor, but it's really the principle of using any preprocessor not aware of the language being preprocessed that's at fault here.
If you need to quickly hack some small language, and you're thinking of using preprocessor macros, don't. Many languages like especially Ruby let you write truly beautiful DSLs, with zero of preprocessor's limitations, and you get full power of a real language when you need it, with proper testing tools.
Code
All code examples for the series will be in this repository.