Matching Patterns with Scheme
A while back, I wrote a
post
about macros in Scheme. Today I want to take a look at how one might
begin to implement a macro system. In Scheme, whether you use
syntax-rules
or syntax-case
do write your macros, at some point
you'll write patterns and templates. Macros match their input against
a pattern and then use this to instantiate a template. Let's consider
a two-way or
macro:
(define-syntax or2
(syntax-rules ()
((_ e1 e2)
(let ((t e1))
(if t t e2)))))
We can see how this works at the REPL by using expand
:
> (expand '(or2 1 2))
(let ([t.8 1]) (if t.8 t.8 2))
This matches the expression (or2 1 2)
against the pattern (_ e1 e2)
. The underscore means we don't care what's in that position (it's
always or2
, since this is part of the or2
macro definition). The
names e1
and e2
allow us to refer to what occurs in this pattern
in the template. When instantiating the template, the macro system
replaces all references to e1
with the chunk of syntax from that
position (1
in this case), and likewise with e2
. The result is the
expression, (let ([t.8 1]) (if t.8 t.8 2))
. Here we let-bind the
first expression because we do not want to accidentally evaluate any
side effects it might contain twice. Don't worry too much about why
the variable name is t.8
instead of just t
. This has to do with
something called hygiene, which I will hopefully say more on in a
later post.
Now let's take a look at how we would write a function to match
patterns like this. We'll start with a procedure called match
that
takes four arguments: the pattern to match against, the expression to
match, a success continuation, and a failure continuation. The success
continuation is called if the expression matches the pattern, and the
failure continuation is called otherwise. When I first started writing
this, it seemed like it would be cleaner to use success and failure
continuations, although I'm not entirely sure that's true anymore. At
the moment, we will support three things for patterns:
- Pairs - A pair pattern matches an expression if and only if the expression is a pair, and the car of the pattern matches the car of the expression and the cdr of the pattern matches the cdr of the expression.
- Symbols - A symbol pattern binds a pattern variable. This matches any expression, and binds that expression to the variable.
- Null - A null pattern matches an expression if and only if the expression is null.
Scheme's pattern matching supports several other features, like auxiliary keywords, which we're ignoring for the now for the sake of simplicity. Using our three kinds of patterns so far, we can translate this pretty directly into a Scheme program:
(define (match p e sk fk)
(cond
((and (pair? p) (pair? e))
(match (car p) (car e)
(lambda (b)
(match (cdr p) (cdr e)
(lambda (b^) (sk (append b b^)))
fk))
fk))
((and (symbol? p))
(sk (list (cons p e))))
((and (null? p) (null? e))
(sk '()))
(else (fk))))
Notice that we pass a list of bindings to the success continuation. These will be used by the template instantiation code. Let's try an example:
> (match '(_ e1 e2) '(or2 1 2) (lambda (b) b) (lambda () #f))
((_ . or2) (e1 . 1) (e2 . 2))
For the most part, this does what we want. The matcher says the
expression matches the pattern, and tells us that e1
was bound to
1
and e2
was bound to 2
. However, we also bound _
to or2
. In
reality, we want _
to mean ignore, which we can do by adding one
line before the symbol?
line in our cond clause:
((eq? p '_)
(sk '()))
Now we get only the bindings we care about.
The next part is to use the resulting bindings to instantiate the
template. We'll do this with a function called instantiate
, which
takes a template (called p
), and a list of bindings. This is a
straightforward tree walk that tries to replace symbols with
expressions when they are bound:
(define (instantiate p bindings)
(cond
((pair? p)
(cons (instantiate (car p) bindings)
(instantiate (cdr p) bindings)))
((assq p bindings) => cdr)
(else p)))
If we did this right, we should be able to instantiate the template
from the or2
macro above.
> (instantiate
'(let ((t e1))
(if t t e2))
'((e1 . 1) (e2 . 2)))
(let ([t 1]) (if t t 2))
It looks like we got what we wanted, although without hygiene.
At this point we have a pattern matcher and template instantiation function that is suitable for making a very simple macro system. One very nice feature that we are missing at the moment is ellipsis patterns, which allow us to match repeated patterns. For example, we could use this to write a custom let macro like this:
(define-syntax my-let
(syntax-rules ()
((_ ((x e) ...) b)
((lambda (x ...) b) e ...))))
In a post at some point in the hopefully near future, I will show how
to handle patterns with ...
in them.