Linked lists are fun to talk about because they’re the simplest recursive data structure.
When I was giving this talk to new Ruby developers, I generally began by motivating why you should care about linked lists, since it’s not a structure that they had encountered very often. I would talk about recursion and trees and more complicated recursive data structures that show up in web programming all the time, and explain how the high-level ideas are all the same and how the DOM and class hierarchies are recursive and all that jazz.
I’m going to skip that part here because it’s boring and this is the internet. In any case the real reason I talk about linked lists in this discussion is because you can write very little code with them and still make a point.
An artificial problem
So here’s the overly simplified toy problem that we’re going to be thinking about:
- You have a linked list of strings.
- You want to count the total number of characters in the list.
For example, take this linked list of names:
┌─────┬──┐ ┌─────────┬──┐ ┌──────┬──┐
│ ian │ ───> │ ulysses │ ───> │ doug │ ───> end
└─────┴──┘ └─────────┴──┘ └──────┴──┘
If we count it up, we can see that it contains a total of 14 characters (3 + 7 + 4). But of course, we want our code to do that for us.
Representation
Let’s start with a simple representation of a linked list as a List
class that looks something like this:
class List
def initialize(data, nxt)
@data = data
@next = nxt
end
end
We can represent the above example linked list like this:
list = List.new("ian", List.new("ulysses", List.new("doug", nil)))
This might be the “obvious” representation to you, or it might seem odd. In any case, it’ll do for a little while.
Take one
Most fellows I worked with would write something like this for the length
method:
class List
def length
count = 0
node = self
while node do
count += node.data.length
node = node.next
end
count
end
end
This is a very “imperative” approach. “Do this, then do that, then until this happens do these things; update this state throughout and then return it.”
And there’s nothing wrong with this sort of code. This is not a post about one programming style being better than another. I just want to present an alternative.
Consider this one:
class List
def length
if @next
@data.len + @next.length
else
@data.len
end
end
end
This is a little bit shorter. This is the basic recursive implementation, where we break the operation up into two steps: a given node examines itself, then it hands control off to the rest of the list, then it combines the results.
But think instead about what it’s saying: “If we’re the last node, then our length is just the length of the string we have. If we’re not the last node, then our length is the length of our string plus the length of the rest of the list.”
This is slightly more declarative. Rather than declaring the steps needed to calculate the answer, we’re just saying “this is what the answer is.”
But… that’s not really true. In fact this is just as imperative as before, if you look at it right. We could interpret this as “First check if we have a next node. If we don’t, just return the length of our string. If we do, then first access the length of our string, then invoke the length
method on our next
value, then add the two and return the result.” After all, we’re still programming – we’re still telling the computer what to do.
I might call this “more declarative,” but that’s such a fuzzy word that you could argue this is just as “imperative” as before. And you’d be right.
But let’s keep going anyway.
Towards elegance
I don’t like the if @next
check. But if we don’t have it, the program will crash when we try to evaluate nil.length
. How can we fix that?
Well, for starters, using nil
as our terminal value is gross: it means that we cannot have an “empty” list that really behaves like a list – for an empty list would just be nil
, and nil
has no length
method.
So let’s use a better representation.
class List
end
class ListMiddle < List
attr_accessor :data, :next
def initialize(data, nxt)
@data = data
@next = nxt
end
end
class ListEnd < List
end
list = ListMiddle.new("ian",
ListMiddle.new("ulysses",
ListMiddle.new("doug",
ListEnd.new)))
We break up the representation of a linked list into two types of nodes: intermediate or initial nodes, which contain data, and the sentinel node, which takes the place of nil
but doesn’t really contain any data.
The advantage of doing this is that we can define methods on ListEnd
so that we can have empty lists:
class ListEnd
def length
0
end
end
class ListMiddle
def length
@data.length + @next.length
end
end
And this way we don’t need to treat the end of the list specially: the length of ListMiddle
is always the length of our string plus the length of the rest of us. We rely on Ruby’s dynamic dispatch to decide which implementation of length
gets invoked.
Again, I would call this code quite “declarative,” even more so than before. The length of the empty linked list is zero, full stop. The length of a list with data is the length of its string plus the length of the rest of the list.
But still, we’re also saying: access the length, invoke this method, add the two numbers and return them…
More declarative
In real Ruby code, when we deal with collections, we’re probably dealing with Enumerable
objects. So let’s extend our List
to become Enumerable
.
class List
include Enumerable
end
class ListMiddle < List
attr_accessor :data, :next
def initialize(data, nxt)
@data = data
@next = nxt
end
def each(&block)
yield @data
@next.each(&block)
end
end
class ListEnd < List
def each
end
end
All we have to do is mixin Enumerable
and define each
, which yield
s all of its values to the block provided. Since ListEnd
has no values, it doesn’t yield
anything, and ListMiddle
uses a recursive implementation similar to our recursive length
from before. Don’t worry too much about this; just know that we can now do things like this:
list.map {|x| x.length}
And use all of the other familiar Enumerable
methods.
How does this help us? Well, apart from being more canonical, it lets us think about our algorithm in more generic terms that would work on a linked list or an array or whatever other Enumerable
container we throw at it.
What we’re really doing is looking at every element in the collection, combining them somehow, and then returning a single value. We’re reducing a collection of values into a single summary value.
It turns out this “type” of algorithm crops up all the time, and there’s no need to manually define a recursive implementation whenever we want to do something like this. Instead, there’s a helper – the aptly named reduce
– that we have access to because we’ve mixed in Enumerable
. And we can use that to define length
instead:
class List
def length
self.reduce(0) do |sum, data|
sum + data.length
end
end
end
This is a bit more declarative to me: even though we’re still specifying what the computer should do (namely, invoke reduce
with this block), to us as humans we can read this as “length is a reduction that sums the length of the data
elements.”
Note that we don’t need two implementations of length
anymore – our two implementations of each
are sufficient. But you can sort of “see” our previous implementations in the reduction. The body is very similar to ListMiddle#length
, and the 0
from ListEnd#length
is still there.
Where do we stop? This implementation is perhaps not as declarative as it could be. Consider an alternative:
class List
def length
self.map(&:length).reduce(0, :+)
end
end
Here we’ve nearly reduced the problem to its essence. We realize that we don’t really care about reducing a collection of strings; we’re really just reducing a collection of numbers. So first we turn the collection of strings into a collection of numbers with map
, then we reduce them, with 0
as the “empty” case, and +
as the reduction operation.
But we’ve still written that “in order to execute length
, you must execute map
and then reduce
and return the result.” There is still an imperative interpretation that will never go away.
It would be nice if we could get even more declarative than this. Wouldn’t it be nice to say length
is the map
/reduce
sequence seen here? Not that the body of this algorithm invokes these other helpers, but that it is the same as those?
I don’t know how to do that in Ruby. I am not a Ruby programmer; it is likely possible using some of the metaprogramming facilities. But generally composing methods like map
and reduce
is difficult and rather inelegant, and whatever more declarative approach exists would probably be much longer.
Pointless programming
In Haskell, we can write something slightly more declarative than this. But let’s start with a baseline translation of what we had in Ruby:
totalLength list = foldl (+) 0 (map length list)
This is very similar to our Ruby implementation, despite some superficial differences. We’re declaring a function, not a method, which takes as its argument a linked list of strings. We have to use a new name because Haskell doesn’t allow us to overload length
, and reduce
is called foldl
– but otherwise you can squint and call it the same. 0
is still there as the base case; we write (+)
instead of :+
; map(&:length)
becomes merely map length
. We don’t use dot notation because Haskell doesn’t have methods, only functions.
Now I want to make it even more declarative. Rather than saying “totalLength
is a function that takes list and invokes map length
and foldl
on it,” I want to say that totalLength
is merely the combination of those two things.
totalLength = foldl (+) 0 . map length
Haskell lets us write code like this – a function that appears at first glance to take no arguments, but that actually behaves exactly the same as before. What it says is that totalLength
is the composition (.
being the function composition operator) of map length
and foldl (+) 0
. totalLength
is merely the name that we’re giving to the operation of extracting the lengths of the strings and then summing them.
In fact Haskell has in its standard library a helper function called sum
, which lets us write an equivalent definition that is perhaps easier for non-Haskellers to read:
totalLength = sum . map length
This is fairly declarative.
Code like this is called point-free in the Haskell world.
Point-free style was very foreign to me and hard for me to understand when I first started playing with Haskell. But practicing reading point-free definitions has really changed the way I approach implementation – it really compels you to think in chunks, to recognize patterns and opportunities for simplification.
What’s the point of all of this
The point of this isn’t to say that imperative code is bad and that declarative code is good and you should try to make your implementations more declarative or something. Sometimes that makes sense; sometimes it doesn’t.
While I’ve been guiding the discussion towards a more declarative implementation, I don’t mean to say that it’s better, or some goal we should aspire to. Just that my original audience would usually start on the imperative end, so to discuss the entire spectrum it was necessary to move away from that.
Imperative and declarative are interesting distinctions that exist in our minds. They’re fuzzy and imprecise – they really underscore the room for creativity in the world of programming. I’ve found the declarative mode of thinking to be really helpful in guiding the way I write code, and you might find the same if you haven’t tried it before.