Given the contents of my bar:

$ cat facts/bar
brandy
dry vermouth
lemon
light rum
lime
orange liqueur
reposado tequila
sugar

And my cocktail recipe book:

$ head facts/recipes
martini <- london dry gin
martini <- dry vermouth
daiquiri <- light rum
daiquiri <- lime juice
daiquiri <- simple syrup
margarita <- blanco tequila or reposado tequila
margarita <- lime juice
margarita <- orange liqueur
margarita <- lime wedge
...

What cocktails can I mix?

$ cat results/mixable
daiquiri
margarita
sidecar

Gosh, that’s not very many. What could I add to my bar to expand my options?

$ cat results/shopping-list
london dry gin -> gimlet
london dry gin -> martini
champagne -> airmail
cognac -> between-the-sheets
sherry -> sherry-cobbler
rhum agricole -> ti-punch

this seems dumb

No you’re wrong it’s smart. Take a closer look at that Daiquiri: the recipe calls for lime juice, but we don’t have lime juice in our bar. We only have lime. So why does it show up as mixable?

I’m so glad you asked. These results are generated by something called Mixologician, a tiny Datalog program I wrote.

And Mixologician knows some rules. It knows that limes make lime juice – and lime wedge, and lime zest, etc. It even knows that lime zest plus sugar makes lime cordial – so it knows that we’re only one ingredient away from being able to make a Gimlet, even though we don’t technically have a single ingredient that the drink calls for.

can you ease me into this with a semi-apocryphal origin story

Sure can!

I like cocktails. Once upon a time, I liked to go to restaurants and order cocktails. But for some reason I haven’t been going to restaurants for a while, and instead I’ve been making cocktails at home.

But I don’t want to mix the same cocktails every time. A big part of what I enjoyed about the restaurant-cocktail-ordering experience was the variety: every fancy hipster restaurant I used to frequent had its own cocktail menu – it’s own unique set of drinks, full of ingredients I had never heard of.

Which is sort of a problem, for the home bartender.

See, I don’t have a very comprehensive liquor collection. I’ve got the basics, sure, and over the course of quarantining I’ve acquired a few fancier ingredients. But fancy ingredients usually aren’t very versatile: I bought a bottle of Amaro Nonino once to mix a Paper Plane. But it turns out I don’t like really Paper Planes. So now I just have, like, 97.1% of a bottle of Amaro Nonino, and nothing to do with it.1

That was not very efficient purchase. We can do better.

So I wrote a little program to tell me: given what I have in my bar right now, what should I add that will enable me to make the maximum number of new cocktails. Or in other words, what is my most efficient purchase – what is the ingredient that I am most “blocked on.”

that relatable story captured my attention tell me more

I thought this would be a good opportunity to try out Datalog, a word that I’d heard before.

Now, you might be thinking that this sounds like a trivial problem, and you can just write a couple lines of Python to do set subtraction based on what you have and what you need and then just filter down to the cocktails missing exactly one ingredient.

And… that’s true. This is not a very hard problem.

The reason logic programming seemed appropriate to me was the idea of these “production rules:” ingredients that can become (or combine to become) other ingredients.

Now, I’m not saying that it would be hard to model that in your favorite scripting language, but it’s not really obvious any more how to do it without a guess-and-check sort of brute force search. Not that there’s anything wrong with that; it just feels like a more natural fit for logic programming.

Or not! Who knows. It’s not like I know anything about logic programming. This whole thing is just an excuse to try it out.

well let’s see it

The code, like all code, is on GitHub. You can look through the project – I would recommend starting with the tests, as they provide a sort of literate guide to how it works.

But I dare say the code is not all that interesting, by itself: if you already know Datalog, it’s trivial; if you don’t, it’s inscrutable.

So instead, I’m going to walk through the final result piece by piece, and talk about some of the missteps and mistakes I made along the way. Since this was my first encounter with Datalog, it took me a little while before I was able to “think” the way that Datalog wanted me to – and I think the errors I made along the way are a lot more interesting than the final result.

cat mixologician.dl

You can follow along at home if you’d like.

We open with some type definitions:

.type Ingredient <: symbol
.type Recipe <: symbol

Custom types make the rest of the code more readable, and will let the compiler catch dumb mistakes like mixing up the order of my relations.

symbol is what Datalog calls “string” – I point this out because it might be a little confusing if you’re used to LISP or Ruby or some other language that uses the term differently.

.decl Has(x : Ingredient)
.input Has(filename="bar")

Has is our first “relation.”

I had a lot of trouble thinking about relations until I had an insight that seems very obvious in hindsight: I actually use relations all the time, in relational databases. So it might be useful to think of Has like a table in SQL, with a single column. It’s not quite the same thing, but I found it was a decent starting point for thinking about how to “shape” my data.

That .input line just loads a list of ingredients from a text file – one ingredient per line. Inserts rows into our table, if you will – except that in Datalog, “rows” are called “facts.”

.decl Needs(drink : Recipe, ingredient : Ingredient)
.input Needs(filename="recipes", delimiter=" <- ")

This relation is more interesting: these are our recipes. Although in a traditional language we might think of a recipe as { name : string, ingredients : List<string> }, we have to sort of bend things to fit into the “table-like” representation of relations.

Again, we just load data from a plain text file. By default, Soufflé wants to read tab-separated values, but I made a custom delimiter because I think it reads a little nicer, and so that I wouldn’t have to deal with tabs. Nobody wants to deal with tabs.

.decl Begets(in : Ingredient, out : Ingredient)
.input Begets(filename="begets", delimiter=" -> ")
.input Begets(filename="auto-begets", delimiter=" -> ")

Begets is a relation we’re going to talk a lot about later. For now, just think of it as a bunch of statements like “limes beget lime juice” or “Cognac begets brandy” – ingredients that can become other ingredients, or ingredients that can act as other ingredients. The name Begets is weird and a little clumsy; I’ll explain why I chose it over Makes or ActsAs or something once we see how it’s used.

I load Begets from two separate files because one is hand-curated and the other is autogenerated – I’ll explain why that is later, when we talk about the recipe book.

.decl Composite(result : Ingredient, first : Ingredient, second : Ingredient)
.input Composite(filename="combinations", delimiter=", ")

These are our “multi-ingredient ingredients.” Think flavored syrups, basically – look in the file for some examples:

$ cat facts/combinations
cinnamon syrup, sugar, cinnamon sticks
ginger syrup, sugar, ginger
lime cordial, sugar, lime zest
...

I only bothered to support two-ingredient combinations, because that covers basically everything. But you could add another Composite3() relation or something to support more ingredients, if you wanted.

.decl IsRecipe(x : Recipe)
IsRecipe(x) :- Needs(x, _).

Now things get a little interesting: we have our first rule.

I don’t load this one from a file. Instead I’m basically saying “x is a recipe if it appears in the first ‘column’ of the Needs relation.” You could think of this as analogous to a SQL view, kinda like:

create view IsRecipe as
select name from Needs;

I made IsRecipe just as a helper, because I think it’s a lot more explicit than writing Needs(x, _) when I mean “all recipes.”

.decl IsIngredient(x : Ingredient)
IsIngredient(x) :- Needs(_, x).
IsIngredient(x) :- Begets(x, _).
IsIngredient(x) :- Begets(_, x).
IsIngredient(x) :- Composite(x, _, _).
IsIngredient(x) :- Composite(_, x, _).
IsIngredient(x) :- Composite(_, _, x).

This is another helper, but it has multiple rules. All of those are kind of “unioned” together. Basically: “x is an ingredient if it is used in a recipe or if it appears in a begets rule or in a composite description.”

We could just list out every ingredient in a file somewhere, but I think declaring the relation intensionally is a lot nicer. Also I wanted to show off that for the first time in my life I managed to use that term correctly.2

.decl Unbuyable(x : Ingredient)
.input Unbuyable(filename="unbuyable")
.input Unbuyable(filename="auto-unbuyable")

The next relation is pretty trivial: Unbuyable is a list of things like egg white that appear as ingredients but that I don’t want to appear in the final output, because I can’t actually buy egg whites at the store. Like Begets, I have some autogenerated and some hand-curated entries.

Begets(x, x) :- IsIngredient(x).
Begets(x, z) :- Begets(x, y), Begets(y, z).

Okay, now we’re getting to the good stuff.

That first rule basically says “all ingredients beget themselves.” I would like to be able to just write Begets(x, x)., but Datalog doesn’t allow that kind of “infinite” rule – it needs me to provide a domain for x, and that’s when our IsIngredient helper comes in.

This is why the relation is called “begets” instead of “makes” or “produces” or something. Originally it was called Makes, and I had awkward expressions like “if you have x or you have y such that Makes(y, x) then…” By making everything beget itself, I was able to dramatically simplify the “shopping list” calculation.

That second rule just says that Begets is transitive: if limes make lime peel, and lime peel makes lime zest, then limes make lime zest. I actually didn’t bother to write my rules at quite that level of granularity, but I could if I wanted to.

Now let’s take a look at how we actually use Begets.

Has(out) :- Has(in), Begets(in, out).

Basically, if you have an input (e.g. lemon), and the input begets something else (e.g. lemon juice), then you also have the “output.” But you can’t use the word output because it’s reserved and you’ll get a confusing error message if you try:

Error: syntax error, unexpected relation qualifier output, expecting )
in file mixologician.dl at line 32
Has(output) :- Has(in), Begets(in, output).
----^---------------------------------------
1 errors generated, evaluation aborted

You might remember that we originally loaded Has from a file, and it was just a flat list of what Datalog calls “facts.” But now we’re adding facts to it dynamically – we started out thinking of it as a table, but now it’s sort a weird table/view hybrid thing. So the SQL analogy sort of breaks down a bit.

And yes, it might be a little confusing to think about the fact that Begets(x, x), so we’re making a sort of self-referential infinite statement here: “if you have x then you have x because x begets x.” But Datalog doesn’t mind.

Begets(x, result) :- Composite(result, first, second), Has(first), Begets(x, second).
Begets(x, result) :- Composite(result, first, second), Has(second), Begets(x, first).

Here we say that one component of a “composite ingredient” begets that composite ingredient, but only if you already have the other component.

This is the first complicated rule we have, so I’m going to work up to this from simpler examples. This was the first thing I tried:

Begets("lime zest", "lime cordial") :- Has("sugar").
Begets("sugar", "lime cordial") :- Has("lime zest").

That says “lime zest begets lime cordial if you have sugar, and sugar begets lime cordial if you have lime zest.”

And this actually works pretty well – but, you know, we don’t want to write that in code. We want to load these facts from files, so we bring in the Composite relation:

Begets(first, result) :- Composite(result, first, second), Has(second).
Begets(second, result) :- Composite(result, first, second), Has(first).

That’s just a restatement of what we wrote last time, but now it applies to anything in our Composite relation.

But there’s a subtle problem with this logic. Namely: lime zest begets lime cordial, but lime does not. And I usually don’t keep lime zest stocked by itself, so according to this logic, I can’t make lime cordial even if I have lime.

So that’s the reason we have this bit of indirection: really anything that begets lime zest also begets lime cordial, as long as you also have sugar. So let’s look at the working rule in “concrete” form, without the Composite relation:

Begets(x, "lime cordial") :- Has("sugar"), Begets(x, "lime zest").
Begets(x, "lime cordial") :- Has("lime zest"), Begets(x, "sugar").

I think that’s a lot easier to read, and you can imagine applying the same Composite replacement here to get to the “full” rule above.

Note that we don’t need to get even more indirect:

Begets(x, "lime cordial") :- Has(y), Begets(y, "sugar"), Begets(x, "lime zest").
Begets(x, "lime cordial") :- Has(y), Begets(y, "lime zest"), Begets(x, "sugar").

I only mention this because I actually wrote that first, before I realized that it was unnecessary: because of the rule that Has(out) :- Has(in), Begets(in, out)., we cover this case by virtue of the Has("lime zest") bit.3

Alright. Back to the code.

.decl Missing(drink : Recipe, ingredient : Ingredient)
Missing(drink, ingredient) :- Needs(drink, ingredient), !Has(ingredient).

This is a pretty trivial helper relation – a drink is missing an ingredient if it needs it and we don’t have it. This is our first example of relation negation, which is a fun phrase.

.decl Mixable(drink : Recipe)
Mixable(drink) :- IsRecipe(drink), !Missing(drink, _).

And we use it to declare all of the drinks that we are able to mix – which is to say, all drinks that do not have any missing ingredients.

At first I wanted to just write this:

Mixable(drink) :- !Missing(drink, _).

But Soufflé rejects that: we need to restrict the domain of the relation. You can maybe think of this as trying to create a SQL table with every value that does not exist in another SQL table – you’d run out of disk space pretty quickly. Or, if you’re Soufflé, you’d run out of memory, I guess?

Error: Ungrounded variable drink in file mixologician.dl at line 41
Mixable(drink) :- !Missing(drink, _).

I think this offers an interesting insight into the nature of Datalog: although we write rules as if we’re declaring functions, in the end I think all relations need to be able to be realized as a big pile of tuples. The engine might be able to optimize out the actual realization, but it needs to be possible. I think. Like I said, I don’t actually know anything about Datalog.

.decl MixableRecipe(drink : Recipe, ingredient : Ingredient)
MixableRecipe(drink, recipe) :- Mixable(drink), Needs(drink, recipe).

This relation is very simple, and I added it the first time I actually tried to use Mixologician to find a drink to make. Instead of just a list of names, it’s exactly my recipe book filtered to the recipes that I can make – so I can search for specific ingredients, if I’m in the mood for something in particular.

But the last rule is the meatiest. So without further ado, the reason we’re all here:

.decl Enables(missing : Ingredient, drink : Recipe)
Enables(ingredient, drink) :-
  !Unbuyable(ingredient),
  Missing(drink, out),
  Begets(ingredient, out),
  count : { Missing(drink, _) } =
  count : { Begets(ingredient, product), Missing(drink, product) }
.

.output Mixable(filename="mixable")
.output MixableRecipe(filename="mixable-recipes", delimiter=" <- ")
.output Enables(filename="shopping-list", delimiter=" -> ")

And that’s the whole program. We did it.

I’m going to spend a lot of time talking about that Enables rule, so I wanted to get the .output lines out of the way first. While I was writing this program, I would frequently dump out other relations too, as a way to debug things. It was pretty useful.

But okay, let’s break this monster down:

Enables(ingredient, drink) :-
  !Unbuyable(ingredient),
  Missing(drink, out),
  Begets(ingredient, out),
  count : { Missing(drink, _) } =
  count : { Begets(ingredient, product), Missing(drink, product) }
.

The !Unbuyable(ingredient) clause is trivial – it just exists to filter out things like lime zest that we don’t want to see in our output. Let’s ignore that for now:

Enables(ingredient, drink) :-
  Missing(drink, out),
  Begets(ingredient, out),
  count : { Missing(drink, _) } =
  count : { Begets(ingredient, product), Missing(drink, product) }
.

In English: “ingredient enables you to make drink if drink is missing something that ingredient can beget, and if the only missing ingredients in drink can be begotten by ingredient.”

What? That’s not what it says at all!

Well, no, but that’s what it means. It really says “ingredient enables you to make drink if drink is missing something that ingredient can beget, and if the number of missing ingredients in drink is equal to the number of missing ingredients in drink that can be begotten by ingredient.” We’re really comparing set cardinality, not set equality, but since one set is a subset of the other, the only way they can have the same cardinality is if the sets are identical.4 We would compare set equality if we could, but as far as I can tell Soufflé can’t do that.

So hopefully that makes sense. It’s a pretty simple expression, in the end. But it took me hours to write it. Really! I think I spent multiple hours coming up with that one rule. It was extremely fun and educational, but I feel like a lot of the benefit of the journey is lost when you just jump right to the final answer.

i mean it seems obvious

Well, sure, but I want to take a few steps back, and talk about how I got here. I want you to come on a journey with me, which will end with that expression. But will start with something very different.

See, when I first started, I was not thinking in Datalog. I was thinking in “logic.” I wanted to write something like this:

Enables(ingredient, drink) :-
  !Mixable(drink),
  !Has(ingredient),
  Has(ingredient) -> Mixable(drink)
.

(That’s not valid Datalog, in case that isn’t clear.)

I wanted to be able to say “show me ingredients that I don’t currently have, but that if I had them then I would be able to mix a new drink.” That’s, like, the logical expression that I spent a long time trying to translate into Datalog, but that it just isn’t possible to say – there’s no “hypothetical” operator. Maybe I could write something like that in Prolog, though? As previously stated, I don’t know anything about logic programming.

So anyway, I quickly realized it was just not possible to express that in Datalog. So I tried to do something simpler: just find recipes that are missing exactly one ingredient. Forget all about Begets or Composites or whatever; we’ll add that back in later. For now, how do you just find recipes that are “almost” mixable?

The first thing I wanted to write was something like this:

AlmostMixable(drink) :-
  Missing(drink, ingredient), other != ingredient, !Missing(drink, other).

Which was my extremely clumsy attempt at translating the standard first-order logic “uniqueness” expression:

Missing(drink, ingredient) ∧ ∀x(Missing(drink, x) → x = ingredient)

Basically I was trying to say “drink is missing ingredient and it isn’t missing anything else.” But that doesn’t work: I was still thinking in “logic,” and not in Datalog.

Eventually, while reading the docs, I saw the count aggregate, and was able to write this – my first expression that actually told me something interesting:

AlmostMixable(drink) :-
  Missing(drink, _), count : { Missing(drink, _) } = 1.

Great.

Well, actually, first I tried to just write this:

AlmostMixable(drink) :- count : { Missing(drink, _) } = 1.

But Soufflé told me:

Error: Witness problem: argument grounded by an aggregator's inner scope is 
used ungrounded in outer scope in file mixologician.dl at line 53
AlmostMixable(drink) :- count : { Missing(drink, _) } = 1.
--------------^--------------------------------------------
1 errors generated, evaluation aborted

And I figured out that I needed the first Missing(drink, _) in order to “ground” the variable – you get it. You run into the same types of errors when you use SQL aggregate expressions – referencing variables that don’t “exist” at the same level. I don’t really know the right words to talk about this intelligibly.

Anyway, once I got there, I wanted to see the ingredient that was missing. But that was an easy change – just stop ignoring that variable:

Enables(ingredient, drink) :-
  Missing(drink, ingredient), count : { Missing(drink, _) } = 1.

And that’s great! That does tell me what drinks are missing one ingredient.

But now we need to worry about ingredients that “make” other ingredients. And at this point, all I had was a relation that looked like this:

.decl Makes(in : Ingredient, out : Ingredient)
Has(out) :- Has(in), Makes(in, out).

Basically, I didn’t have the Begets(x, x) rule yet – but we’ll get there.

So I wrote this:

Enables(ingredient, drink) :-
  Missing(drink, ingredient), count : { Missing(drink, _) } = 1.

Enables(in, drink) :- Makes(in, missing), Enables(missing, drink).

And that works for a lot of cases.

But that second Enables rule isn’t sufficient for everything: it basically says “if a drink is missing one ingredient, then acquiring anything that makes that ingredient will also allow you to make the drink.”

This works for something simple, like “limes give you lime juice so now you can make a Margarita.” But what if a drink is missing multiple ingredients, all of which could be made by acquiring a single new ingredient? This comes up in practice with the Gimlet: say you have gin and sugar, but you’re missing lime juice and lime cordial. You only need to buy limes, but the Gimlet is missing two ingredients, so it doesn’t show up.

So I needed to do something more complicated.

Here’s what I tried:

Enables(ingredient, drink) :-
  Missing(drink, ingredient), count : { Missing(drink, _) } = 1.

Enables(ingredient, drink) :-
  Missing(drink, out),
  Makes(ingredient, out),
  count : { Missing(drink, _) } =
  count : { Makes(ingredient, product), Missing(drink, product) }
.

For a while I thought this worked. It looked like it worked. So I wrote some tests. And in the process of doing so, I thought of a case it didn’t cover:

If an ingredient ever shows up both as a component of a recipe and as something that can make another component – imagine a recipe that requires both “reposado tequila” and “tequila,” for some reason – then this won’t detect that you could use reposado tequila as both of those ingredients.

My first attempt at a fix looked like this (semicolon means “or”):

Enables(ingredient, drink) :-
  Missing(drink, ingredient), count : { Missing(drink, _) } = 1.

Enables(ingredient, drink) :-
  Missing(drink, out),
  Makes(ingredient, out),
  count : { Missing(drink, _) } =
  count : { (ingredient = product; Makes(ingredient, product)), 
            Missing(drink, product)
          }
  .

But Soufflé did not like that:

Error: ERROR: disjunctions in aggregation clauses are currently not supported
1 errors generated, evaluation aborted

Which is surprising, but okay, sure, a disjunction in a count is just addition:

Enables(ingredient, drink) :-
  Missing(drink, ingredient), count : { Missing(drink, _) } = 1.

Enables(ingredient, drink) :-
  Missing(drink, out),
  Makes(ingredient, out),
  count : { Missing(drink, _) } =
    count : { Makes(ingredient, product), Missing(drink, product) }
    +
    count : { Missing(drink, ingredient) }
  .

Even though the second count will only ever be 0 or 1.

Anyway, this fixes the problem – but we still need the base case rule of “drinks with a single missing ingredient.” Why? Because of our outdated rule of Makes(ingredient, out). I originally added that to “ground” the out variable outside of the aggregate, but now it’s overly restrictive. Since we have the two aggregate clauses, we now want to say:

Enables(ingredient, drink) :-
  Missing(drink, out),
  (ingredient = out; Makes(ingredient, out)),
  count : { Missing(drink, _) } =
    count : { Makes(ingredient, product), Missing(drink, product) }
    +
    count : { Missing(drink, ingredient) }
  .

And that reduces it down to a single rule that covers both cases.

But… that’s a pretty hairy rule. And having the two different cases – “is ingredient” or “makes ingredient” – in two different places is just messy.

At this point I think it best to just quote from my diary, to show you my thought process. Yes, of course I kept a diary while I was doing this.

So what if we just said Makes("lime", "lime")., for all values of "lime"?

Error: Argument in fact is not constant in file mixologician.dl at line 10
Makes(x, x).
------^------

Well, hmm. I would like to say that, but I can sort of appreciate that this kind of statement isn’t allowed.

I find an example in the manual that declares reflexivity like this:

Makes(x, x) :- Makes(x, _).

Which is a little mind-bending, but I think it makes sense if I think about it for a bit. x makes x if x makes anything – therefore x makes x because x makes x. It’s, you know, yeah.

But that interpretation is clearly wrong: when I run it, the only things that “make” themselves are things that already have other “make” rules – so it generates the fact Makes("reposado tequila", "reposado tequila"). but not the fact Makes("tequila", "tequila").. So, okay, fine – I can write the dumb simple thing instead:

Makes(x, x) :- Needs(_, x).

Basically: anything that goes into a drink “makes” itself, as a simple way to enumerate all ingredients. This seems a little confusing, so it inspires me to make an alias:

.decl IsIngredient(x : Ingredient)
IsIngredient(x) :- Needs(_, x).

Makes(x, x) :- IsIngredient(x).

That’s a little better.

And then I rename Makes to Provides, because “tequila provides tequila” sounds more natural to me than “tequila makes tequila.” I dunno; it’s not great. Naming things is hard, especially while working in a completely unfamiliar paradigm. Maybe Begets? I like that better: as we all know, tequila begets more tequila.

Anyway, this actually works now, and allows me to simplify the Enables rule substantially:

Enables(ingredient, drink) :-
  Missing(drink, out),
  Begets(ingredient, out),
  count : { Missing(drink, _) } =
  count : { Begets(ingredient, product), Missing(drink, product) }
  .

And that’s the full story.

you know that it isn’t

No! Of course it isn’t. I did so many other dumb things. My first attempt to model this problem looked like this:

.decl Has(ingredient : symbol)
Has("lime juice") :- Has("lime")
Has("tequila") :- Has("reposado tequila")
Has("tequila") :- Has("blanco tequila")
Has("margarita") :- Has("tequila"), Has("lime juice"), Has("triple sec")

Just one relation, no types, expressing “facts” (recipes) as “rules” – and of course I wasn’t able to get very far like that. But, you know, brand new language, brand new concepts. I didn’t know what I was doing. I eventually got there, by trial and error, and by having the important insight that I should be modeling my data like I would in a relational database. Heck, even the idea to create a Makes relation instead of writing each “production rule” as a rule was a big logical leap for me.

But I can’t go through every little thought that went through my head. This is already really long. And we still need to talk about the next part: recipes.

wait did you actually write tests for a little afternoon project like some kind of psychopath

Oh. Yeah. It was great! It was my first time doing this, and it was a really nice way to convince myself that the code I was writing actually did what I wanted it to as I changed it. Obviously I didn’t start playing around with a brand new tool by writing tests for it – I’m not a monster – but when it came time to refactor and simplify and check for the sticky stuff, it was a lot nicer than commenting and uncommenting a bunch of debug lines.

And the Soufflé testing framework is suprpisingly pleasant! All I needed to do was spin up a Docker node that– no no I am kidding. There are no Soufflé tests. I just used Cram. Here’s an excerpt from the tests I wrote:

Now let's test multi-ingredient syrups.

  $ empty_bar
  $ add_recipe gimlet gin "lime juice" "lime cordial"
  $ buy gin
  $ buy "lime juice"

In order to make lime cordial, I need both limes and sugar. My lime *juice*
won't do, because I need the zest. That's two new ingredients, so neither will
show up on my shopping list.

  $ runtest
  shopping list:
  lime cordial -> gimlet

But it does tell me that I can just buy lime cordial directly (if, you know, I
could find it in a store).

But once I buy sugar...

  $ buy sugar
  $ runtest
  shopping list:
  lime cordial -> gimlet
  lime -> gimlet

It lets me know that I could either buy lime cordial directly, or I could just
buy limes.

In fact, now that I have sugar, I can get rid of my lime juice, because all I'll
need to make a gimlet is lime.

  $ sell "lime juice"
  $ runtest
  shopping list:
  lime -> gimlet

Nice. Notice that I can no longer just buy lime cordial, as that won't be
sufficient -- can't make lime juice out of lime cordial.

Not bad, right?

Cram has a few UX issues in interactive mode – you can’t split hunks; you can’t use a custom diff tool – but the idea of Cram is great, and these shortcomings are pretty minor relative to how useful it is for testing random stuff like this.

And honestly, I probably wouldn’t have done this without Nix’s help. Being able to just add python39Packages.cram to my shell.nix felt pretty magical. I don’t think I would have reached for Cram if I had to like… go through the whole rigamarole of pyenv (or asdf?) and pip and virtualenv and whatever the Python ecosystem things are these days. Nix just handed me Cram, and handled all that other nonsense for me.

But okay! That’s all. Now, if you don’t mind, let’s talk about the recipes.

but that sounds boring

Yes. The interesting part is over. We did the logic programming. You can skip directly to the part where we run it if you want.

The rest is just… scraping. But that doesn’t mean I’m not going to go through it, in case you want to do something similar.

First off, I need a source of recipes. But I already have one! For a few years now my go-to cocktail recipe book has been Tuxedo No. 2, a great little site with lots of delicious cocktails and fun stories about them and pretty pictures.

So all I need to do is type up every single recipe from that website into my little text file by hand right.

I started with wget --recursive, tastefully throttled so as not to overburden the server. Then I used pup to parse it out. You could of course use Beautiful Soup or Nokogiri or whatever you’re already comfortable with, but pup is great way to just get something done quickly without leaving the warm embrace of the command line.

Unfortunately, pup wants to give me each individual text node on its own line, when what I actually want is for it to just ignore tags and show me the flattened output.

For reference, here’s an example recipe’s markup:

$ pup -f bijou-cocktail-recipe '.recipe__recipe ul .ingredient'
<span class="ingredient">
 <a href="/ingredients/gin-cocktail-recipes">
  london dry gin
 </a>
 or
 <a href="/ingredients/gin-cocktail-recipes">
  new american gin
 </a>
</span>
<span class="ingredient">
 <a href="/ingredients/chartreuse-cocktail-recipes">
  green chartreuse
 </a>
</span>
<span class="ingredient">
 <a href="/ingredients/vermouth-cocktail-recipes">
  sweet vermouth
 </a>
 <br>
</span>
<span class="ingredient">
 <a href="/ingredients/orange-bitters-cocktail-recipes">
  orange bitters
 </a>
 <br>
</span>
<span class="ingredient">
 <a href="/ingredients/cherry-cocktail-recipes">
  cherry
 </a>
 or
 <a href="/ingredients/lemon-cocktail-recipes">
  lemon peel
 </a>
 for garnish
</span>

(I stripped out some unnecessary attributes from that to just show the structure). But if we just view the text nodes:

$ pup -f bijou-cocktail-recipe '.recipe__recipe ul .ingredient text{}'
london dry gin
 or
new american gin



green chartreuse



sweet vermouth



orange bitters


cherry
 or
lemon peel
 for garnish

We see all the text nodes that just consist of newlines and stuff. Which is, like, correct or whatever but it’s annoying.

So I wrote some horrifying sed to join up these “adjacent” lines:

$ pup -f bijou-cocktail-recipe '.recipe__recipe ul .ingredient text{}' \
| sed -En -e '/^$/d' -e ':start /^$/!{ H; n; b start }' -e 'x; s/\n//g; s/^\s+//; p'
london dry gin or new american gin
green chartreuse
sweet vermouth
orange bitters
cherry or lemon peel for garnish

what the hell was that

If you’ve never seen sed like this, well, consider yourself lucky I guess. But I’ll walk through it, because sed is pretty handy for (write-only!) things like this that are never ever going to be checked into a production codebase ever. If you don’t want to think about sed right now, you can skip to the end of this section without missing anything important.

Very roughly, -E makes regular expressions behave the way you want.5 -n means “only print lines when I tell you to,” and -e means that the next argument is a string of sed expressions. My three “expressions” are:

/^$/d
:start /^$/!{ H; n; b start }
x; s/\n//g; s/^\s+//; p

The first one just says “if the line is empty, delete it and go to the next line, restarting the program.” After s/foo/bar/, I think this is probably the most familiar sed command, so I won’t dwell on it.

The second uses a couple of lesser-known sed concepts: branching and the hold space. Basically, I’m making a loop: you could read this as:

while(current line is not empty) {
    append line to the hold space
    load the next line
}

But sed has no structured control flow statements, so I’m using branch, which is an unconditional GOTO.

You might not know about sed’s “hold space,” because there’s no reason you would have ever encountered it by accident. But basically, sed has two buffers: the hold space and the pattern space. When it reads a line, it reads it into the pattern space. Commands like s/foo/bar/ operate on the pattern space. But there are commands to sort of “stash” lines for later use. For example, here’s a sed program that moves the first line of its input to the bottom:

$ seq 1 5 | sed -ne '1h; 1!p; ${ g; p }'
2
3
4
5
1

In English: if it’s the first line (lines are 1-indexed), then copy the pattern space to the hold space. If it’s not the first line, print the pattern space. Once we reach the last line, set the pattern space to the contents of the hold space and then print it out again.

Anyway. Back to our program:

/^$/d
:start /^$/!{ H; n; b start }
x; s/\n//g; s/^\s+//; p

x swaps the hold space and the pattern space. We know that we’re on an empty line now – which is to say, the pattern space is empty – because we broke out of our loop. So this basically puts the hold space in the pattern space, and then clears the hold space.

So at this point our pattern space looks something like this:

london dry gin
 or
new american gin

Which is to say, it’s a multi-line string. So we s/\n//g to get rid of all the newlines, then strip leading whitespace, and then print out the result. And then sed implicitly grabs the next line and starts the program over.

Fun? Fun.

Pretty sure there’s a bug where if there’s no final empty line at the end of our input it will silently not output the last line. But it works for me. Also we could move the nonempty-line check to the branch statement since we know that we’re starting non-empty since that first d didn’t fire and hey let’s stop talking about sed now.6

the end of the sed section

Okay phew. Now that we have the ingredients on their own lines, we need to turn them into a format that works for our recipe. It’s pretty easy to reformat them, but first we have to think about what to do with this:

london dry gin or new american gin
green chartreuse
sweet vermouth
orange bitters
cherry or lemon peel for garnish

That’s the ingredient list for the Bijou, which I picked because it has “or” ingredients and a garnish. I’ve never tried one, but it sounds pretty good.

When I’m mixing drinks for myself, I don’t usually bother to garnish them, so I wouldn’t say that I can’t mix a Bijou just because I’m out of cherries. So I’m just going to omit any ingredient lines with “garnish” in them. This might be sad for some drinks where the garnish is very important, but, you know, my little program doesn’t need to know that.

But I still have to deal with that gin. Going into this, I had originally thought that I should just “explode” these recipes out into multiple recipes. So I’d end up with something like this:7

bijou1 -> london dry gin
bijou1 -> green chartreuse
bijou1 -> sweet vermouth
bijou1 -> orange bitters

bijou2 -> new american gin
bijou2 -> green chartreuse
bijou2 -> sweet vermouth
bijou2 -> orange bitters

But that will really mess with the “what’s the best ingredient to buy” logic. I don’t want it to say “you should buy green chartreuse because then you can make a Bijou1 and a Bijou2.” They’re still basically the same drink.

But! I already spent a lot of time on the “Begets” relation, so why not just treat “london dry gin or new american gin” as a single ingredient, and say, you know:

Begets("london dry gin",   "london dry gin or new american gin").
Begets("new american gin", "london dry gin or new american gin").

It’s spiritually equivalent to Begets("london dry gin", "gin"), it just looks a little goofier. So let’s do that.

But I can’t buy “london dry gin or new american gin” at the store. It would bother me if that showed up in the list of ingredients that I should acquire. So I mark it as an “unbuyable” ingredient, and ask my Enables relation not to show me any ingredients that I can’t actually buy. This is the birth of the Unbuyable relation that you already saw, and I’ll be the first to admit that it’s sort of a weird hack. But it works!

Anyway, I throw a for loop around it and format the output correctly:

$ for file in *-cocktail-recipe; do
> name=${file%-cocktail-recipe}
> pup -p -f "$file" '.recipe__recipe ul .ingredient text{}' \
> | sed -En -e '/^$/d' -e ':start /^$/!{ H; n; b start }' -e 'x; s/\n//g; s/^\s+//; p' \
> | xargs -n1 -d'\n' printf '%s <- %s\n' "$name"
> done
18th-century <- batavia arrack
18th-century <- creme de cacao
18th-century <- sweet vermouth
18th-century <- lime juice
20th-century <- london dry gin
...

Quite the one-liner.

So that did give me a list of all recipes, but it’s a little messy. I’ll need to clean it up a bit. Here are a few examples of lines that just won’t do:

212 <- soda water (optional)
arrack-strap <- dashes mole bitters
champagne-cocktail <- brandy , optional
gilchrist <- amaro averna or similar sweet amaro
negus-punch <- zest of 2 lemon
old-fashioned <- simple syrup, rich or 1 sugar cube
pearl-of-puebla <- sprigs of fresh oregano
pineapple-fizz <- soda water to fill
three-dots-a-dash <- dark rum (see paragraph four)
verte-chaud <- wet heavy cream to top

But this is pretty boring and easy to fix. There aren’t very many lines, so I could just go through it by hand. But I opt to write some more sed, finding weird things and inconsistencies by running sort -u on just the ingredients as I fixed things.

That was probably the most boring part of the whole project. It’s not worth walking through it; the code is out there.

And finally I have a real recipe book – or at least, a starting point. I can always add to it later; I can always incorporate other sources.

The next step is to generate rules for all the “or” instructions. This was pretty easy. Guess what I used? Yes. It’s sed. You can do anything with sed.

$ sed recipes -En -e 's/^.+<- //' -e '/ or /{ s/(.+) or (.+)/\1 -> \0\n\2 -> \0/; p }'
lillet -> lillet or cocchi americano
cocchi americano -> lillet or cocchi americano
london dry gin -> london dry gin or new american gin
new american gin -> london dry gin or new american gin
lemon juice -> lemon juice or lime juice
lime juice -> lemon juice or lime juice
whole egg -> whole egg or egg white
egg white -> whole egg or egg white
madeira -> madeira or sherry
sherry -> madeira or sherry
...

Gosh that’s beautiful, isn’t it? I hate sed so much. I save the output to a file, and then I generate the trivial list of “ingredients with ‘or’ in the name” and mark them as unbuyable.

And… that’s it!

is that it

Sigh, no, I actually need to go and catalog my bar now. Be right back…

Alright. Done. Wasn’t too bad, actually. And now, the moment of truth…

$ head results/shopping-list
pineapple -> sutter-s-mill
ginger -> penicillin
ginger -> presbyterian
ginger syrup -> presbyterian
sage -> nicholas-sage
creme de cacao -> 20th-century
creme de cacao -> brandy-alexander
creme de cacao -> green-glacier
grapefruit juice -> 212
grapefruit juice -> brown-derby

I lament my lack of grapefruit juice often. The Brown Derby is one of my all-time favorite drinks. But I have 88 lines of output here! I don’t want to read all that. What’s the best return on my investment?

$ sed results/shopping-list -E -e 's/ -> .+$//' | uniq -c | sort -nr | head
    7 peychaud's bitters
    6 dry vermouth
    5 champagne
    4 egg
    4 curacao
    3 orgeat
    3 orange
    3 green chartreuse
    3 grapefruit juice
    3 creme de cacao

Peychaud’s bitters! Well I’ll be.

$ grep '^peychaud' results/shopping-list
peychaud's bitters -> cocktail-a-la-louisianne
peychaud's bitters -> improved-whiskey-cocktail
peychaud's bitters -> monte-carlo
peychaud's bitters -> queens-park-swizzle
peychaud's bitters -> sawyer
peychaud's bitters -> sazerac
peychaud's bitters -> vieux-carre

I’d say that some of these look right up my alley.

But I can’t buy any new ingredients tonight. The hour has grown late; the stores are long closed. Tonight, I’ll have to settle for something I already have on hand. An old standby, that I always have in stock…

$ grep -q negroni results/mixable
$ echo $?
0

Cheers.


  1. Actually, it’s pretty tasty by itself. ↩︎

  2. Mnemonic: “extensional” and “enumerate” start with ‘e;’ “intensional” and “inference” start with ‘i.’ ↩︎

  3. But then why do we need to do it for the other ingredient? Because we don’t “have” that other ingredient, so the rule doesn’t fire.

    It’s the same reason you can’t just say:

    Has(result) :- Composite(result, first, second), Has(first), Has(second).
    

    Which is completely true – but not sufficient. We have to use the Begets relation to sort of… conjure up a hypothecial.

    Yes, it was very confusing for me to think about and I don’t know if this makes any sense if you didn’t go through it like I did. I’m doing my best. ↩︎

  4. For a proof of this claim, see lemma 14.3 on page 431 of Ian’s Shaky Memory of Basic Logical Concepts. ↩︎

  5. This is a complicated topic and I don’t want to go into “extended” regular expressions vs. “enhanced” regular expressions which is a real thing you have to think about if you use BSD sed like macOS does. ↩︎

  6. Or we can keep talking about sed. I golfed this after the fact to /./{H;b};x;s/^\s+|\n//gp. Can you do better? ↩︎

  7. I mean, in this particular case I feel like I could just replace “london dry gin or new american gin” with the word “gin” because I’m not really that fussy about my cocktails. But you get the general point. ↩︎