What is Macro Hygiene?

One important, though surprisingly uncommon, feature of macro systems is that of hygiene. I mentioned in a previous post that I would eventually say something about hygiene. It turns out macro hygiene is somewhat tricky to define precisely, and I know a couple of people who are actively working on a formal definition of hygiene. The intuition behind hygiene isn't too bad though. Basically, we want our macros to not break our code. So how can macros break code?

Recall that macros are basically programs that transform your program's code, rather than runtime values. In doing so, they may introduce new variable bindings. If we're not careful, these new bindings can end up capturing variables in your own code. That is, the new binding might shadow a variable you as the programmer have already created. All of a sudden, the variable you thought you were referring to is no longer the same thing, and because all of this code is hidden in a macro expansion, it will be very hard to figure out what's going on.

Consider the following Scheme macro for or.

(define-syntax or
  (syntax-rules ()
    ((_ e1 e2)
     (let ((t e1))
       (if t t e2)))))

Like a good macro, this uses a temporary variable, t, to avoid calculating e1 twice and potentially duplicating any side effects in that expression. Unfortunately, if we're not careful, this binding can capture an existing binding of t. Consider the following program.

(let ((t 5))
  (or #f t))

If you run this in the Scheme REPL, you should get 5. Let's see what happens if we blindly expand the or macro without regard for hygiene. We would end up with the following program.

(let ((t 5))
  (let ((t #f))
    (if t t t)))

This program evaluates to #f, which is the exact opposite of what we were supposed to get! Expanding our macro has shadowed the binding of t to 5 with a binding of t to #f.

One way to work around this, which was a common trick for LISP programmers of yore is to choose variable names that a programmer is unlikely to guess. We could rewrite our or macro like this:

(define-syntax or
  (syntax-rules ()
    ((_ e1 e2)
     (let ((this-is-my-super-secret-name-which-you-will-never-guess e1))
       (if this-is-my-super-secret-name-which-you-will-never-guess
           this-is-my-super-secret-name-which-you-will-never-guess
           e2)))))

This works okay, except it's a lot more typing. Eventually, some overly clever programmer is going to name their own variable this-is-my-super-secret-name-which-you-will-never-guess, and then their program will break in really unexpected ways. It'd really be great if our macro expander could take care of these issues on its own. That way, the macro writers could type less and use names they like, and macro users don't have to worry about their variables being captured unexpectedly.

We could modify the macro expander to automatically rename any variables bound by a macro expansion. In this case, our simple test program would expand as follows, using our first definition of or.

(let ((t 5))
  (let ((t.1 #f))
    (if t.1 t.1 t)))

This program evaluates as expected, so things are looking good. But, what if someone writes this program?

(let ((if (lambda (a b c) b)))
  (or #f 5))

We'll use our variable-renaming expander and see that we end up with the following program:

(let ((if (lambda (a b c) b)))
  (let ((t.1 #f))
    (if t.1 t.1 5)))

This program once again evaluates to #f instead of 5 like we'd like it too. This illustrates the second, and more subtle, class of hygiene error. The problem is that the programmer's definition of if has captured the if used by the expansion of or. Now, many languages treat keywords like if specially and don't let you name your variables after them. Enforcing a rule like that in Scheme would solve this particular case, but at the cost of a lot of the power that Scheme programmers love. The proper solution is to find some way of tracking what if was bound to when the or macro was defined and using that version in the expansion of or.

Properly maintaining hygiene in macro systems turns out to be really tricky. The reward is worth it, however, as programmers can then reason much more easily about the behavior of macros and start to rely on them in large projects. Macros are a powerful feature of programming languages, and many newer language have some form of macro system. Sadly, these are often not hygienic, or they so "Oh, we'll add hygiene later." Given how important hygiene is, and how tricky it is to get right, language designers should really implemented hygiene in their macro systems from the very beginning.

You might also like🔗