Linked lists are fun to talk about because they’re the simplest recursive data structure.

When I was giving this talk to new Ruby developers, I generally began by motivating why you should care about linked lists, since it’s not a structure that they had encountered very often. I would talk about recursion and trees and more complicated recursive data structures that show up in web programming all the time, and explain how the high-level ideas are all the same and how the DOM and class hierarchies are recursive and all that jazz.

I’m going to skip that part here because it’s boring and this is the internet. In any case the real reason I talk about linked lists in this discussion is because you can write very little code with them and still make a point.

An artificial problem

So here’s the overly simplified toy problem that we’re going to be thinking about:

  • You have a linked list of strings.
  • You want to count the total number of characters in the list.

For example, take this linked list of names:

┌─────┬──┐   ┌─────────┬──┐   ┌──────┬──┐
│ ian │ ───> │ ulysses │ ───> │ doug │ ───> end
└─────┴──┘   └─────────┴──┘   └──────┴──┘

If we count it up, we can see that it contains a total of 14 characters (3 + 7 + 4). But of course, we want our code to do that for us.

Representation

Let’s start with a simple representation of a linked list as a List class that looks something like this:

class List
  def initialize(data, nxt)
    @data = data
    @next = nxt
  end
end

We can represent the above example linked list like this:

list = List.new("ian", List.new("ulysses", List.new("doug", nil)))

This might be the “obvious” representation to you, or it might seem odd. In any case, it’ll do for a little while.

Take one

Most fellows I worked with would write something like this for the length method:

class List
  def length
    count = 0
    node = self
    while node do
      count += node.data.length
      node = node.next
    end
    count
  end
end

This is a very “imperative” approach. “Do this, then do that, then until this happens do these things; update this state throughout and then return it.”

And there’s nothing wrong with this sort of code. This is not a post about one programming style being better than another. I just want to present an alternative.

Consider this one:

class List
  def length
    if @next
      @data.len + @next.length
    else
      @data.len
    end
  end
end

This is a little bit shorter. This is the basic recursive implementation, where we break the operation up into two steps: a given node examines itself, then it hands control off to the rest of the list, then it combines the results.

But think instead about what it’s saying: “If we’re the last node, then our length is just the length of the string we have. If we’re not the last node, then our length is the length of our string plus the length of the rest of the list.”

This is slightly more declarative. Rather than declaring the steps needed to calculate the answer, we’re just saying “this is what the answer is.”

But… that’s not really true. In fact this is just as imperative as before, if you look at it right. We could interpret this as “First check if we have a next node. If we don’t, just return the length of our string. If we do, then first access the length of our string, then invoke the length method on our next value, then add the two and return the result.” After all, we’re still programming – we’re still telling the computer what to do.

I might call this “more declarative,” but that’s such a fuzzy word that you could argue this is just as “imperative” as before. And you’d be right.

But let’s keep going anyway.

Towards elegance

I don’t like the if @next check. But if we don’t have it, the program will crash when we try to evaluate nil.length. How can we fix that?

Well, for starters, using nil as our terminal value is gross: it means that we cannot have an “empty” list that really behaves like a list – for an empty list would just be nil, and nil has no length method.

So let’s use a better representation.

class List
end

class ListMiddle < List
  attr_accessor :data, :next
  def initialize(data, nxt)
    @data = data
    @next = nxt
  end
end

class ListEnd < List
end

list = ListMiddle.new("ian",
       ListMiddle.new("ulysses",
       ListMiddle.new("doug",
       ListEnd.new)))

We break up the representation of a linked list into two types of nodes: intermediate or initial nodes, which contain data, and the sentinel node, which takes the place of nil but doesn’t really contain any data.

The advantage of doing this is that we can define methods on ListEnd so that we can have empty lists:

class ListEnd
  def length
    0
  end
end

class ListMiddle
  def length
    @data.length + @next.length
  end
end

And this way we don’t need to treat the end of the list specially: the length of ListMiddle is always the length of our string plus the length of the rest of us. We rely on Ruby’s dynamic dispatch to decide which implementation of length gets invoked.

Again, I would call this code quite “declarative,” even more so than before. The length of the empty linked list is zero, full stop. The length of a list with data is the length of its string plus the length of the rest of the list.

But still, we’re also saying: access the length, invoke this method, add the two numbers and return them…

More declarative

In real Ruby code, when we deal with collections, we’re probably dealing with Enumerable objects. So let’s extend our List to become Enumerable.

class List
    include Enumerable
end

class ListMiddle < List
  attr_accessor :data, :next
  def initialize(data, nxt)
    @data = data
    @next = nxt
  end

  def each(&block)
    yield @data
    @next.each(&block)
  end
end

class ListEnd < List
  def each
  end
end

All we have to do is mixin Enumerable and define each, which yields all of its values to the block provided. Since ListEnd has no values, it doesn’t yield anything, and ListMiddle uses a recursive implementation similar to our recursive length from before. Don’t worry too much about this; just know that we can now do things like this:

list.map {|x| x.length}

And use all of the other familiar Enumerable methods.

How does this help us? Well, apart from being more canonical, it lets us think about our algorithm in more generic terms that would work on a linked list or an array or whatever other Enumerable container we throw at it.

What we’re really doing is looking at every element in the collection, combining them somehow, and then returning a single value. We’re reducing a collection of values into a single summary value.

It turns out this “type” of algorithm crops up all the time, and there’s no need to manually define a recursive implementation whenever we want to do something like this. Instead, there’s a helper – the aptly named reduce – that we have access to because we’ve mixed in Enumerable. And we can use that to define length instead:

class List
  def length
    self.reduce(0) do |sum, data|
      sum + data.length
    end
  end
end

This is a bit more declarative to me: even though we’re still specifying what the computer should do (namely, invoke reduce with this block), to us as humans we can read this as “length is a reduction that sums the length of the data elements.”

Note that we don’t need two implementations of length anymore – our two implementations of each are sufficient. But you can sort of “see” our previous implementations in the reduction. The body is very similar to ListMiddle#length, and the 0 from ListEnd#length is still there.

Where do we stop? This implementation is perhaps not as declarative as it could be. Consider an alternative:

class List
  def length
    self.map(&:length).reduce(0, :+)
  end
end

Here we’ve nearly reduced the problem to its essence. We realize that we don’t really care about reducing a collection of strings; we’re really just reducing a collection of numbers. So first we turn the collection of strings into a collection of numbers with map, then we reduce them, with 0 as the “empty” case, and + as the reduction operation.

But we’ve still written that “in order to execute length, you must execute map and then reduce and return the result.” There is still an imperative interpretation that will never go away.

It would be nice if we could get even more declarative than this. Wouldn’t it be nice to say length is the map/reduce sequence seen here? Not that the body of this algorithm invokes these other helpers, but that it is the same as those?

I don’t know how to do that in Ruby. I am not a Ruby programmer; it is likely possible using some of the metaprogramming facilities. But generally composing methods like map and reduce is difficult and rather inelegant, and whatever more declarative approach exists would probably be much longer.

Pointless programming

In Haskell, we can write something slightly more declarative than this. But let’s start with a baseline translation of what we had in Ruby:

totalLength list = foldl (+) 0 (map length list)

This is very similar to our Ruby implementation, despite some superficial differences. We’re declaring a function, not a method, which takes as its argument a linked list of strings. We have to use a new name because Haskell doesn’t allow us to overload length, and reduce is called foldl – but otherwise you can squint and call it the same. 0 is still there as the base case; we write (+) instead of :+; map(&:length) becomes merely map length. We don’t use dot notation because Haskell doesn’t have methods, only functions.

Now I want to make it even more declarative. Rather than saying “totalLength is a function that takes list and invokes map length and foldl on it,” I want to say that totalLength is merely the combination of those two things.

totalLength = foldl (+) 0 . map length

Haskell lets us write code like this – a function that appears at first glance to take no arguments, but that actually behaves exactly the same as before. What it says is that totalLength is the composition (. being the function composition operator) of map length and foldl (+) 0. totalLength is merely the name that we’re giving to the operation of extracting the lengths of the strings and then summing them.

In fact Haskell has in its standard library a helper function called sum, which lets us write an equivalent definition that is perhaps easier for non-Haskellers to read:

totalLength = sum . map length

This is fairly declarative.

Code like this is called point-free in the Haskell world.

Point-free style was very foreign to me and hard for me to understand when I first started playing with Haskell. But practicing reading point-free definitions has really changed the way I approach implementation – it really compels you to think in chunks, to recognize patterns and opportunities for simplification.

What’s the point of all of this

The point of this isn’t to say that imperative code is bad and that declarative code is good and you should try to make your implementations more declarative or something. Sometimes that makes sense; sometimes it doesn’t.

While I’ve been guiding the discussion towards a more declarative implementation, I don’t mean to say that it’s better, or some goal we should aspire to. Just that my original audience would usually start on the imperative end, so to discuss the entire spectrum it was necessary to move away from that.

Imperative and declarative are interesting distinctions that exist in our minds. They’re fuzzy and imprecise – they really underscore the room for creativity in the world of programming. I’ve found the declarative mode of thinking to be really helpful in guiding the way I write code, and you might find the same if you haven’t tried it before.