In order to continue making progress with my little Janet game, I had to write some tests. Because even though I’d only just started, I already had a bug.

(There is a piece missing from the bottom of this rectangle. There should not be a piece missing from the bottom of this rectangle.)

Now, I’m sure that I wrote plenty of bugs over the course of writing my raycaster, and I didn’t bother writing tests for any them – I just fixed them, re-ran the code, and manually verified that the bug had disappeared.

But this bug was a very shy bug. It was hard to manually reproduce it; it was hard to trigger the exact conditions that caused it. So it would have been really annoying to fix it using my usual guess-and-check approach – it’d be much easier to trigger the conditions once, print out the state, put that into a test, and then re-run the test until I had a fix.

But, sadly, things are not so simple.

Because I cannot actually write tests.

I can only write self-modifying tests.

It’s an unfortunate, incurable condition. Now that I am used to writing self-modifying tests, I can’t go back to writing tests strewn with boolean assertions. I have tasted of ambrosia, and cannot now return to wine.

what is a self-modifying test

It’s a term I just made up.

I’m sorry. There is not, as far as I know, a standard accepted term for this kind of test.

Mercurial calls them “unified tests,” which was the first term I ever heard for them. Jane Street calls them “expect tests,” so that is what they are in my head. But most libraries seem to re-use the term “snapshot tests.”

But snapshot test is too vague a term. Self-modifying tests are a type of snapshot test, but not all snapshot tests are self-modifying tests.1 So I’m going to use this dumb term until I come up with a better one, or until I get over myself and start calling them “snapshot tests” like everyone else in the world.

Anyway.

Writing a self-modifying test feels like using a REPL. A REPL with all your editor integration working the way you like and keybindings and no weird half-baked readline that you have to wrestle with.2 You just write an expression, run the test, and see the result right in your editor. Once it looks right, you save the file and move on – and if it ever changes again, the test will fail.

More importantly, reading a self-modifying test is like reading a REPL session – which is something that my brain just intuitively groks more easily than assert or .expect.not.to.be.whatever. And when a test fails, instead of getting assert failed: 2 should have been 4 and having to spend a minute trying to figure out what actually happened, you get to look at a diff that shows you exactly what it should be and what it is instead. With context and everything. It’s great.

I could write a whole post about how much I love self-modifying tests – and I probably will, as soon as I come up with a catchier term3 – but that’s all I’ll say about them for now.

I’m having fun writing Janet, but I need to write some tests.

And I can’t go back to writing regular assert-based tests.

So I guess I’ve got a test framework to write.

furious coding montage

I wrote it. It’s done. It works. You can use it, if you want to. I even wrote examples and documentation.

It’s called Judge, and it looks like this:

(use judge)

(defn capitalize [str]
  (string
    (string/ascii-upper (string/slice str 0 1))
    (string/slice str 1)))

(test "test capitalization"
  (each name ["eleanor" "chidi" "tahani" "jason"]
    (expect (capitalize name) "Eleanor" "Chidi" "Tahani" "Jason")))

But that doesn’t really do it justice. All tests look like that. To really understand Judge, you need to observe it in motion. Because it really looks like this:

I feel the need to point out that, although that is a recording of an Emacs session, there is nothing Emacs-specific about Judge. This is not some complicated thing that connects to some sort of Emacs sub-process and uses some RPC mechanism to evaluate expressions and return values like you may have seen elsewhere. It looks fancy, but all you’re really seeing are commands to:

  • execute the current Janet file (which writes out a test.janet.corrected)
  • display the diff between test.janet and test.janet.corrected
  • mv test.janet.corrected test.janet
  • reload test.janet from disk, highlighting any differences

Which are all generic operations that you can easily do from any editor.4 Or from no editor! I just did this from the command line for a while before I wrote the Emacs “integration.”

Anyway: this workflow is great and it’s very pleasant to use, but I have no illusions about you actually using my weird testing library in this weird language that you’ve barely heard of. You are not here for the library; you are here for the overly verbose, rambling story about writing the library. And I’m happy to oblige.

how Judge works

Judge is surprisingly simple. The code, I mean. The core of Judge – the API you import to actually define tests – is only about 100 lines of code. The test runner is another 300 or so, but that’s all straightforward test selection and argument parsing and error printing stuff.5

But it took me a while to write those lines. And in the process I learned a lot about Janet, a lot about macros, and a lot about lisp in general.

So let’s break Judge down into a few parts. We need to write the test macro, to define a test. We need to write the expect macro, to define a specific value. And we need a way to rewrite Janet code, to produce the .corrected files.

Let’s start with that last bit, because that’s the crux of self-modifying tests – the self-modifying part.

So: in order to update our tests, we’ll have to parse the file, find the expression we want to change, and then rewrite the file with the new expression spliced in.

But how do we know where the expression we want to replace is? When we expand the expect macro, can we somehow include the position that that macro occurs in our test file?

Yes! Pretty easily, actually. Take a look at this:

$ cat example.janet
(defmacro print-location []
  (def filename (dyn :current-file))
  (def macro-invocation (dyn :macro-form))
  (def [line col] (tuple/sourcemap macro-invocation))
  ~(printf "Macro expanded at %s:%d:%d" ,filename ,line ,col))

(print-location)
(print-location)
$ janet example.janet
Macro expanded at example.janet:7:1
Macro expanded at example.janet:8:1

During macro expansion, Janet sets a few dynamic variables which we can read. One of these is :macro-form which is – as you might expect – the form actually being expanded.

Usually you wouldn’t really care about this, because usually you only care about the forms passed to your macro – the macro’s arguments – but in our case we’re going to rewrite the entire (expect expression expected-value) form, not just the expected-value part.6

Once we have that, we call tuple/sourcemap to get the line and column of that form.

on tuples

Okay, so, tuple/sourcemap is weird for a couple reasons.

First off, Janet doesn’t have “lists.” It has “tuples.” This is Janet’s term for an immutable array, or an immutable vector, or whatever you want to call it.

Normally you’d write a tuple with square brackets: [1 2 3]. But that’s just syntax sugar for a quoted form, right?

repl> '(1 2 3)
(1 2 3)
repl> [1 2 3]
(1 2 3)

Except… not exactly. Because look; there’s more:

repl> '[1 2 3]
[1 2 3]

What?

So it turns out tuples have this extra bit of information: whether or not they are “bracketed” tuples or “parenthesized” tuples. You can query the “tuple type” at runtime:

repl> (tuple/type '(1 2 3))
:parens
repl> (tuple/type '[1 2 3])
:brackets

But note that any tuple created at runtime is a parenthesized tuple, even if it was defined using square brackets:

repl> (tuple/type (tuple 1 2 3))
:parens
repl> (tuple/type [1 2 3])
:parens

This is very annoying and it will come back to bite us later.

If I could change one thing about Janet, it would be this. I would introduce a first-class list type that uses parentheses, parse forms as lists, and say that tuples are always represented with square brackets. But I can’t. So… whatever. I think this is the grossest thing I have encountered in Janet so far, and now you have encountered it too. I hope it’s not enough to turn you off to Janet, because most of the language is really quite nice.

Anyway, the point of all that is: our macro has access to its own form, which is a tuple of type :parens.

$ cat example.janet
(defmacro even-simpler-macro []
  ~(pp (quote ,(dyn :macro-form))))

(even-simpler-macro)
$ janet example.janet
(even-simpler-macro)

To extract the position of that tuple in our source file, we use tuple/sourcemap.

on source maps

Okay, this part is pretty gross too, but if you can stomach tuple/type you’ll be just fine.

Every tuple – be it a tuple constructed by parsing a file, a tuple constructed as a square-bracketed literal, or a tuple constructed dynamically at runtime – every tuple carries two extra values around with it, whether they’re meaningful or not: the “source map line” and “source map column."

These are mutable values in your otherwise immutable tuple. They’re usually both set to -1, but you can call (tuple/setmap) to change them:

repl> (def runtime-tuple [1 2 3])
(1 2 3)
repl> (tuple/sourcemap runtime-tuple)
(-1 -1)
repl> (tuple/setmap runtime-tuple 10 20)
(1 2 3)
repl> (tuple/sourcemap runtime-tuple)
(10 20)

The only cases when they are set is if they were tuples constructed by the Janet parser – think, like, quoted forms, or macro arguments – or if you set them yourself.

But… why do source maps live on tuples? Sure, most expressions in Janet are probably tuples. But what if we want to find the location of, say, a string literal? How do we do that?

Umm well see the thing is you sort of can’t.

I mean, you sort of can. If you’re using the Janet’s parser API directly, you can get the location of an arbitrary value by saying “excuse me would you please wrap the result in a tuple.” Which is… weird; it’s weird; it feels like a gross hack, but that is the way you do it.

But if you’re writing a macro – as far as I can tell – you’re just out of luck. You can’t ask the parser to wrap the forms it passes to your macro in tuples. This is why we’re rewriting the whole (expect) expression, not just one of its arguments: because we know that the only forms that can get macro-expanded are tuple forms, so we know that (dyn :macro-form) will be a tuple with a source map.

Okay, so that’s the first piece of the puzzle. We know where the (expect) form starts.

But that’s not sufficient to rewrite the file. It’s easy to go from a [line column] pair to a byte index in the file, but how many bytes do we need to replace with our corrected value?

And that’s much harder to figure out. I’m not sure what the right answer is, and I’m not very happy with the solution I came up with.

My solution was to use Janet’s parser API, and to start parsing the file from the expect form’s open paren:

(expect (+ 2 2) 4)
^

Then advance the parser one byte at a time until the parser produces a value, and count up the number of bytes you had to advance it by. This is gross, but it works, and I couldn’t come up with anything better.

we did it

Sort of.

We did the first part; we did the self-modifying part. We did the hard part. We still need to do the other parts, but you know what? We can do the other parts in another blog post. It took an upsettingly long time to record that tiny screencast, for reasons that are entirely uninteresting, and I am tired now and want to go outside.


  1. Some libraries (Jest, Insta) use the term “inline snapshot” to distinguish self-modifying in-band snapshots from normal, boring, out-of-band snapshots. But I feel like that really buries the lede. ↩︎

  2. The Janet REPL is especially frustrating, because ⌃C does not clear the current line – as I am used to in every other REPL I have ever used – but instead exits the REPL. The Janet REPL also has no persistent history, so when you inevitably do this out of reflex, you lose everything you’ve done so far and can’t up-enter your way back to where you were. Also there’s no soft-wrapping of the current input line so like yeah you don’t really want to spend very much time there. ↩︎

  3. Self-modifying tests, self-healing tests, self-driving tests; REPL tests, ouroboros tests, reflective tests, madlib tests… ↩︎

  4. Okay, I don’t know if any other editor can do the “reload file and highlight any changes” bit, but that’s also completely unnecessary. ↩︎

  5. And a lot of that code only exists to support context-sensitive tests, which is a feature that I’ll save for a later blog post. If you took those out, I bet you could cut those numbers in half. ↩︎

  6. There is a good reason for this. Not just because expect is a variadic macro – you can supply multiple “expected values” – but because it would be much harder to find the position of an arbitrary expression, because of the way Janet’s source maps work. But we’ll get to that in a minute. ↩︎