What is Macro Hygiene?
One important, though surprisingly uncommon, feature of macro systems is that of hygiene. I mentioned in a previous post that I would eventually say something about hygiene. It turns out macro hygiene is somewhat tricky to define precisely, and I know a couple of people who are actively working on a formal definition of hygiene. The intuition behind hygiene isn't too bad though. Basically, we want our macros to not break our code. So how can macros break code?
Recall that macros are basically programs that transform your program's code, rather than runtime values. In doing so, they may introduce new variable bindings. If we're not careful, these new bindings can end up capturing variables in your own code. That is, the new binding might shadow a variable you as the programmer have already created. All of a sudden, the variable you thought you were referring to is no longer the same thing, and because all of this code is hidden in a macro expansion, it will be very hard to figure out what's going on.
Consider the following Scheme macro for or
.
(define-syntax or
(syntax-rules ()
((_ e1 e2)
(let ((t e1))
(if t t e2)))))
Like a good macro, this uses a temporary variable, t
, to avoid
calculating e1
twice and potentially duplicating any side effects in
that expression. Unfortunately, if we're not careful, this binding can
capture an existing binding of t
. Consider the following program.
(let ((t 5))
(or #f t))
If you run this in the Scheme REPL, you should get 5
. Let's see what
happens if we blindly expand the or
macro without regard for
hygiene. We would end up with the following program.
(let ((t 5))
(let ((t #f))
(if t t t)))
This program evaluates to #f
, which is the exact opposite of what we
were supposed to get! Expanding our macro has shadowed the binding of
t
to 5
with a binding of t
to #f
.
One way to work around this, which was a common trick for LISP
programmers of yore is to choose variable names that a programmer is
unlikely to guess. We could rewrite our or
macro like this:
(define-syntax or
(syntax-rules ()
((_ e1 e2)
(let ((this-is-my-super-secret-name-which-you-will-never-guess e1))
(if this-is-my-super-secret-name-which-you-will-never-guess
this-is-my-super-secret-name-which-you-will-never-guess
e2)))))
This works okay, except it's a lot more typing. Eventually, some
overly clever programmer is going to name their own variable
this-is-my-super-secret-name-which-you-will-never-guess
, and then
their program will break in really unexpected ways. It'd really be
great if our macro expander could take care of these issues on its
own. That way, the macro writers could type less and use names they
like, and macro users don't have to worry about their variables being
captured unexpectedly.
We could modify the macro expander to automatically rename any
variables bound by a macro expansion. In this case, our simple test
program would expand as follows, using our first definition of or
.
(let ((t 5))
(let ((t.1 #f))
(if t.1 t.1 t)))
This program evaluates as expected, so things are looking good. But, what if someone writes this program?
(let ((if (lambda (a b c) b)))
(or #f 5))
We'll use our variable-renaming expander and see that we end up with the following program:
(let ((if (lambda (a b c) b)))
(let ((t.1 #f))
(if t.1 t.1 5)))
This program once again evaluates to #f
instead of 5
like we'd
like it too. This illustrates the second, and more subtle, class of
hygiene error. The problem is that the programmer's definition of if
has captured the if
used by the expansion of or
. Now, many
languages treat keywords like if
specially and don't let you name
your variables after them. Enforcing a rule like that in Scheme would
solve this particular case, but at the cost of a lot of the power that
Scheme programmers love. The proper solution is to find some way of
tracking what if
was bound to when the or
macro was defined and
using that version in the expansion of or
.
Properly maintaining hygiene in macro systems turns out to be really tricky. The reward is worth it, however, as programmers can then reason much more easily about the behavior of macros and start to rely on them in large projects. Macros are a powerful feature of programming languages, and many newer language have some form of macro system. Sadly, these are often not hygienic, or they so "Oh, we'll add hygiene later." Given how important hygiene is, and how tricky it is to get right, language designers should really implemented hygiene in their macro systems from the very beginning.