A while back, Rich Loveland asked how one might write a Scheme debugger. I realized that I've written many a Scheme interpreter, but I've never really thought about how to write a debugger. This post is a first step down that path. We'll write what I'd consider a minimally functional debugger. It will allow you to break a running program into the debugger (albeit by invoking a primitive function in the program you're running) and then inspect the variables that are available. As an added bonus, you'll be able to evaluate arbitrary Scheme expressions from the debugger and even change the values of variables. For the moment, however, we will not support stepping program execution, which is admittedly an important feature of debuggers.

We'll start by writing an interpreter for a small but interesting subset of Scheme. Then we'll show see how to add debugging features to the interpreter. As usual, you'll be able to see all the code on Github.

The Scheme Interpreter🔗

We'll start with the so-called "Three Line Interpreter" and then add a few features to make it a bit more interesting. If you're familiar with Scheme interpreters, feel free to skim this section. If you want more detail, check out the Essentials of Programming Languages. Our interpreter, value-of, takes two arguments. The first is the expression to evaluate, and the second is the environment that maps variable names onto values. Here's the interpreter:

(define (value-of e env)
  (match e
     (guard (symbol? x))
     (lookup x env))
    ((lambda (,x) ,e)
     (lambda (v) (value-of e (cons (cons x v) env))))
    ((,[e1] ,[e2])
     (e1 e2))))

It's a few more than three lines, but we're mainly interested in the three match clauses. The first line is the variable reference line. If you try to evaluate something like x, we use lookup to go find the current value of x in the environment. I'll omit the definition of lookup, but it's basically the same as (cdr (assq x env)).

The next line is the lambda line. This creates a procedure from a lambda expression. We cheat and use perfectly good Scheme's built-in lambda. Notice this procedure we create takes a value, v, and then calls value-of on the body of the lambda. The important bit is the (cons (cons x v) env), which adds a binding of x to the value that was passed in to the environment, so that the interpreter can find the correct value when evaluating the body.

Finally, we have the application line. Basically, we use match's catamorphism feature to recursively compute the value of both the thing to be applied and the value to apply it to. Since our lambda line evaluates to Scheme procedures, we just apply the value e1 to its argument, e2.

Although this doesn't seem like a very rich language, you could use it to compute any computable function if you wanted to. You probably don't want to though.

More Features🔗

Now that we have the core of our interpreter, we can add features to the language. In most cases, this is as simple as adding a few more clauses to our match expression. Let's start with numbers.

    (,n (guard (number? n)) n)
    ((,op ,[e1] ,[e2]) (guard (memq op '(+ * -)))
     ((eval op) e1 e2))

The first line leaves numbers as they are. There's not really much more to do with them.

The second line lets you do things to numbers. We make sure the operation is one of +, * or -. We could add more, but these are enough for now. I cheated once again and use eval to convert the symbol representing the operator into the Scheme procedure that performs that operation. As before, we use catamorphism to evaluate the two arguments to the operator.

To make debugging a little more interesting, let's add some side effects. We add set! with the following match clause.

    ((set! ,x ,[e])
     (update-env! x e env))

Of course, this isn't very interesting without knowing what update-env! does. Here you go!

(define (update-env! x v env)
  (if (eq? x (caar env))
      (set-cdr! (car env) v)
      (update-env! x v (cdr env))))

Basically, we just search through the environment until we find the variable to change and then use set-cdr! to change its value.

Finally, let's add begin. We could simulate this with lambdas and function applications, but it's much cleaner just to add it directly. We get begin with this clause:

    ((begin ,e* ... ,e)
       (let loop ((e* e*))
         (if (pair? e*)
               (value-of (car e*) env)
               (loop (cdr e*)))))
       (value-of e env)))

We start by evaluating each of the first expressions for their effect, and then we return the value of the last expression.

At this point, we have enough that we can start to write some reasonably interesting programs. In particular, we can write programs that have bugs that we might want to debug. Let's add a debugger!

The Debugger🔗

The first thing we're going to do is add a way to get into the debugger. Most of the time the debugger does this by loading the source files and letting you click on the point in the code where you want a break point. This takes more UI work than I want to do right now, so instead we'll just add a special command to our language, (debug), which breaks into the debugger. As usual, this is as simple as adding another match clause:

     (debugger env))

This calls out to a function we have yet to define, called debugger. We pass in the environment debugger needs to see the environment so we can inspect it.

The debugger itself is just a read-eval-print loop (REPL). It prompts the programmer for a command, then evaluates the command, prints out any results and then continues. Let's start with a debugger that we can get out of:

(define (debugger env)
  (printf "debug> ")
  (let ((cmd (read))
        (continue #t))
    (match cmd
      ((continue) (set! continue #f))
      (,else (printf "unknown command\n")))
    (if continue
        (debugger env)
        (printf "running...\n"))))

We start out by printing debug> and then using read to read in an S-Expression. Much like our interpreter (in fact, this is just an interpreter for another language), we use match to determine which command we're given and then evaluate it. For now we support one command, (continue), which continues the execution of the program. We do this by setting a flag that tells the debugger not to continue it's loop. Since we were called by value-of, we just return to that function and the interpreter picks up where it left off.

Here's an example debugging session:

> (value-of '(debug) '())
debug> (continue)

At the Scheme REPL, we call value-of with a simple expression that immediately breaks into the debugger, and we start in the empty environment ('()).

Most debuggers give you some kind of call stack. The closest analog to that in our language is a list of all the variables in scope, so let's add a way to see these. Once again, we add one more match clause:

      ((show-env) (show-env env))

This just forwards to a procedure, show-env, which prints out the values in the environment. Here's an example of how to use it:

> (value-of '(((lambda (x)
                 (lambda (y)
                       (+ x y))))
debug> (show-env)
0. x: 4
1. y: 5
debug> (continue)

So now we can stop the execution, continue the execution, and see what's in the environment. What if we want to change things? We could add commands to set values, but a more powerful way is to use the target language itself to do this. Thus, we'll add an eval command, which evaluates an expression in the debugger:

      ((eval ,e)
       (printf "~s => ~s\n" e (value-of e env)))

Now, if we want to change values, we can just evaluate a set! expression, like this:

> (value-of '(((lambda (x)
                 (lambda (y)
                       (+ x y))))
debug> (show-env)
0. x: 4
1. y: 5
debug> (eval (set! x 115))
(set! x 115) => #<void>
debug> (show-env)
0. x: 115
1. y: 5
debug> (continue)

As expected, the resulting value changes to reflect the fact that we modified the environment during program execution.

So there's a first steps towards a Scheme debugger. We were able to do this with relatively few changes to our interpreter. It seems to me that adding more advanced features would require more changes. For example, there's no way to inspect or change the code that's running now. To do this, we would have to keep track of this data in a form that is accessible to the debugger. Furthermore, we're still missing a way to do finer grained execution control, such as stepping over a single statement or out of the current lambda. I suspect most if not all of these problems can be solved by writing our interpreter in continuation passing style. I hope to explore this in a later post.