Patterns with Ellipses
Last time, we talked about matching patterns in Scheme. Now we will look at how to extend the pattern matcher and template instantiation code to handle patterns with ellipses.
Do start with, let's look at an example macro. Imagine we wanted to
take the or2
macro from last time, but we want to extend it to
support any number of arguments. Certainly, writing (or2 a (or2 b c))
is a bit of a pain. It'd be much better to simply write (or a b c)
. We can do this by using an ellipses in our patterns, as seen in
this macro, called my-or
to avoid clashing with the built in or
:
(define-syntax my-or
(syntax-rules ()
((_ e) e)
((_ e e* ...)
(let ((t e))
(if t t (my-or e* ...))))))
This macro uses two patterns. The first matches my-or
with exactly
one argument. The second matches my-or
with one argument and any
number of remaining arguments. The second pattern expands into a
recursive call to my-or
, but it decreases the number of arguments
each time until we finally hit the base case. We might trace the
expansion like this:
(my-or a b c)
=>
(let ((t.1 a))
(if t.1 t.1 (my-or b c)))
=>
(let ((t.1 a))
(if t.1 t.1
(let ((t.2 b))
(if t.2 t.2 (my-or c)))))
=>
(let ((t.1 a))
(if t.1 t.1
(let ((t.2 b))
(if t.2 t.2 c))))
Unfortunately, pattern matching and template instantiation become much
trickier in the presence of ellipses. As we will see, however, when
all is said and done we'll have added just one more cond
clause to
both the matcher and the instantiation function.
The basic idea is that we want to keep everything more or less the
same, but when we encounter a ...
, we want to recursively match or
instantiate a pattern as many times as we can and then pick up where
we left off. In my mind, the most complicated part of this becomes
representing the environment between the matcher and instantiation
code. Last time, we represented the environment as an association
list. It's attractive to try to simply pair the pattern variable name
with a list of things it matched instead of just a single value, but
this doesn't seem to preserve quite enough information. Suppose we
matched (my-or (+ 1 2) a b c)
against (_ e e* ...)
, yielding the
bindings ((e . (+ 1 2)) (e* . (a b c)))
. Then let's see what happens
if for some reason we used this to instantiate the template '((e e*) ...)
. We might use a rule like "when instantiating a ...
template,
look up every variable in the environment and zip them together." This
rule would lead to the following instantiation:
'((+ a) (1 b) (2 c))
This rule wouldn't be too hard to implement, but it's also probably
not what we want. In this case, we've broken apart the (+ 1 2)
expression into its individual components, when instead we'd probably
like to keep this together. The semantics we'd like instead lead to
the following result:
'(((+ 1 2) a) ((+ 1 2) b) ((+ 1 2) c))
Admittedly, this is a somewhat contrived example, but as we write more macros it will become apparent that this is usually the behavior we want.
In order to preserve the information we need, we'll represent the
environment more like a tree, which includes a ...
node that
describes pattern variables bound under a ...
. Thus, if we matched
(my-or (+ 1 2) a b c)
against (_ e e* ...)
, we'd end up with the
environment ((e . (+ 1 2)) (... ((e* . a)) ((e* . b)) ((e* .c))))
. Each ...
node in the environment includes a list of
bindings matched under a ...
pattern. The list of association lists
approach allows us to keep track of which variables were matched
together.
We will see that building this environment in the matcher is fairly
straightforward, but more care is needed with template
instantiation. We will define a flatten operation on environments,
which takes in one environment, strips off a level of ellipses, and
returns a list of environments. We then recursively apply the template
instantiation function to each of these new environments. To look at
our running example, ((e . (+ 1 2)) (... ((e* . a)) ((e* . b)) ((e* .c))))
would convert to this:
(((e . (+ 1 2)) (e* . a))
((e . (+ 1 2)) (e* . b))
((e . (+ 1 2)) (e* . c)))
It's easy to see now how instantiating '((e e*) ...)
just requires
mapping the instantiate function with (e e*)
over this list of
environments.
We're not quite done yet though. What happens if we had the pattern
(_ (a ...) (b ...))
, and matched this against ((1 2 3) (x y))
?
We'd end up with this environment:
((... ((a . 1)) ((a . 2)) ((a . 3)))
(... ((b . x)) ((b . y))))
If we try to flatten this, we'll see that we won't have enough b
s to
correspond with each a
. To avoid this problem, we will make the
flatten procedure take a template as input, and only consider ...
nodes that contain pattern variables referenced by the template. This
way, we only expand the environment for either a
or b
, but not
both.
So with all of this out of the way, we can look at an exension of
match
that handles ellipsis patterns:
(define (match* p e sk fk)
(cond
((and (pair? p) (pair? (cdr p)) (eq? '... (cadr p)))
(let loop ((e e)
(b '()))
(if (null? e)
(match* (cddr p) e
(lambda (b^) (sk `((... . ,b) . ,b^)))
fk)
(match* (car p) (car e)
(lambda (b^)
(loop (cdr e) (snoc b b^)))
(lambda ()
(match* (cddr p) e
(lambda (b^) (sk `((... . ,b) . ,b^)))
fk))))))
((and (pair? p) (pair? e))
(match* (car p) (car e)
(lambda (b)
(match* (cdr p) (cdr e)
(lambda (b^) (sk (append b b^)))
fk))
fk))
((eq? p '_)
(sk '()))
((symbol? p)
(sk (list (cons p e))))
((and (null? p) (null? e))
(sk '()))
(else (fk))))
This is almost identical to match
from before, but we've added lines
3--16. These lines first check if the pattern is a ...
pattern, and
if so, we go into a loop where we try to match the pattern as many
times as possible. As we do this, we build up a list of environments
from each successful match. Once we get a failure, instead of calling
the failure continuation we were given, we just continue on matching
the rest of the pattern. We can try matching this against a let
pattern as follows:
> (match* '(_ ((x e) ...) b) '(let ((x 5) (y 6)) (+ x y)) (lambda (b) b) (lambda () #f))
((... ((x . x) (e . 5)) ((x . y) (e . 6))) (b + x y))
Based on our previous discussion of environments, this is what we wanted.
Now let's look at how we combine a pattern with a template. We'll
start with a helper, called extract-...
, which pulls out the
relevant ellipsis bindings from the environment:
(define (extract-... p bindings)
(if (null? bindings)
'()
(let ((rest (extract-... p (cdr bindings)))
(b (car bindings)))
(if (and (eq? (car b) '...) (not (null? (cdr b))))
(let ((names (map car (cadr b))))
(if (ormap (lambda (x) (mem* x p)) names)
(cons (cdr b) rest)
rest))
rest))))
This function works by walking over the set of bindings. When we
encounter a ...
node, we check if any of the variables bound in that
sub-environment are referenced in the pattern (this is what the
(ormap (lambda (x) (mem* x p)) names)
clause does). If any variables
are relevant, we include this sub-environment on the list that is
returned.
Doing this relies on another helper, mem*
, which determines whether
a variable is relevant to a pattern. As its name suggest, this was
originally a blind tree walk that checked if the symbol occurs
anywhere in the pattern. Sadly, it's not that simple. Suppose we had
the environment ((... ((a . 1)) ((a . 2)) ((a . 3))) (... ((b . x)) ((b . y))))
and we were instantiating the pattern ((a b ...) ...)
. Since there are two levels of ...
, we only want to consider
a
on the first level, and then when we get to the second ...
, we
need to consider only b
. This means we need to cut off search when
we see a sub-pattern with an ellipsis. That's what lines 3 and 4 do in
this code:
(define (mem* x ls)
(cond
((and (pair? ls) (pair? (cdr ls)) (eq? (cadr ls) '...))
#f)
((pair? ls)
(or (mem* x (car ls)) (mem* x (cdr ls))))
(else (eq? x ls))))
With our helper functions out of the way, we can look at the extended
version of instantiate
:
(define (instantiate* p bindings)
(cond
((and (pair? p) (pair? (cdr p)) (eq? '... (cadr p)))
(let ((bindings... (extract-... (car p) bindings)))
(if (null? bindings...)
(instantiate* (cddr p) bindings)
(append
(apply map (cons (lambda b*
(instantiate* (car p)
(append (apply append b*)
bindings)))
bindings...))
(instantiate* (cddr p) bindings)))))
((pair? p)
(cons (instantiate* (car p) bindings)
(instantiate* (cdr p) bindings)))
((assq p bindings) => cdr)
(else p)))
One again, this includes the same code as before, extended with lines 3--13 to handle ellipses. This clause works by finding the relevant sub-environments. If it finds any, we append each of the sub-environments to the full environment and use each of these new environments to instantiate the sub-template.
We can pass this the results of the our match*
call above, and see
that it is able to reconstruct a let
expression:
> (instantiate* '(let ((x e) ...) b)
'((... ((x . x) (e . 5)) ((x . y) (e . 6))) (b + x y)))
(let ([x 5] [y 6]) (+ x y))
Of course, we can also use this to convert let
expressions to
function applications:
> (instantiate* '((lambda (x ...) b) e ...)
'((... ((x . x) (e . 5)) ((x . y) (e . 6))) (b + x y)))
((lambda (x y) (+ x y)) 5 6)
And, we can handle some harder cases too:
> (instantiate* '((a b ...) ...)
'((... ((a . 1)) ((a . 2)) ((a . 3)))
(... ((b . x)) ((b . y)))))
((1 x y) (2 x y) (3 x y))
And here's my personal favorite:
> (instantiate* '((a a ...) ...) '((... ((a . 1)) ((a . 2)) ((a . 3)))))
((1 1 2 3) (2 1 2 3) (3 1 2 3))