I’ve been reading Okasaki’s Purely Functional Data Structures with work friends at TVision, and decided to get a little overly ambitious with the exercises on Binomial Heaps in chapter 3. This is a curious data structure with some unusual invariants which the author maintains by storing extra information in the nodes. Then he shows how some of the information can be inferred from the structure, saving space. It occurred to me that it all can actually be tracked in the types, saving even more space. Or so it seemed, anyway. I had to give it a try.

If you want to skip the intro and take a look at the code I came up with, here’s the gist.

The data structure is built up in two steps. First, a Binomial Tree is a tree with an interesting recursive structure:

  1. A rank-n tree contains a root value and n child trees
  2. Each child is a binomial tree, with ranks from 0 to n - 1
  3. It follows that a rank-n tree contains 2ⁿ values (i.e. it’s always full)

You can represent this with a simple ADT which might look like this in Haskell:

data BinomialTree a
  = Node a [BinomialTree a]

But that shape doesn’t express the invariants about the rank of each tree and its list of children, so to help keep track of that Okasaki adds the rank to each node:

data BinomialTree a
  = Node Int a [BinomialTree a]

That way operations on these trees can check and maintain their ranks. But the compiler isn’t helping; a bug in any function can construct a non-sensical tree. With a little bit of dependent typing, we can do better:

import GHC.TypeLits
...
data BinomialTree a (rank :: Nat) 
  = Node a (SizedList (BinomialTree a) rank)

Notice that a tree’s rank now appears only in the tree’s type. SizedList is a list in which each cons records the length of the list, and it also requires the types of the list’s elements to be a matching series (see the gist for the details). For example, a BinomialTree Int 2 contains a SizedList … 2 whose elements are BinomialTree Int 1 and BinomialTree Int 0, all of which matches the invariants given above. This works pretty well; the compiler is now able to check that the ranks of a Node and its children are related in the expected way. If you try to put one together wrong, the compiler helpfully points out your mistake (well, you get an error that says something like expected type 0 doesn't match actual type 1…) So far, so good.

Next, the Binomial Heap data structure is built on Binomial Trees, and mirrors them in an odd way: A Binomial Heap is a (flat) list of trees with increasing ranks. The author’s type is again (too) simple: type BinomialHeap = [BinomialTree]. Again the trees are expected to form a series of increasing rank, except this time there can be gaps in the sequence.

Jumping to the chase, here’s a strong type for these lists:

type BinomialHeap a
  = RankedList (MaybeF (BinomialTree a)) 0

What’s interesting here is RankedList, which is a kind of opposite of SizedList; the head of a RankedList shares the list’s overall type (e.g. 0, in a heap), and the tail’s type is one greater. This type works out in the same way, making it impossible to construct a value in the BinomialHeap type that violates the invariants. Note: there’s no need to track any kind of overall rank for the heap, it’s always a list of trees starting from rank 0.

One of the fun things that comes up is the observation that the children of a Binomial Tree, when reversed, form a valid Binomial Heap. For that to work out, I needed to be able to write reverseSized :: SizedList f len -> RankedList f 0 (that is, given a SizedList containing elements with types [f len-1, ... f 0], construct a RankedList containing the same elements starting from f 0.) It wasn’t obvious to me that that would even work, and it was a challenge to get right, but the result is wonderfully simple.

I wonder: what would an opposite reversing function look like? reverseRanked :: RankedList f 0 -> SizedList f len won’t work, because the RankedList might not have the right number of elements for the desired type. I can think of a few possibilities; might need to hack on this some more sometime.

There were more fun puzzles to solve along the way. Overall, I think Haskell’s facilities for dependent typing worked pretty well here. I don’t think I want to write code like this all the time, but if I was implementing a tricky data structure like this for production use, I would definitely give it a try.