Persistent Trees in git, Clojure and CouchDB

This is a tale of three images. I found these images while investigating the internals of several different applications. There are some really neat software projects emerging at the moment, and as a developer I always find it interesting to take a look at the implementation details, because there is often a lot to be learned. It’s not always something you might need right now, but maybe a few years down the line you may be confronted with a similar problem. Plus – in my opinion – knowing a bit about the internals of a program helps reasoning about its behaviour.

Exhibit A: git repositories

git trees Persistent Trees in git, Clojure and CouchDB

Let’s start with the first image. This one is taken from Scott Chacon’s talk “Getting Git”. (Actually I had to mix slides 138 and 142 together to better fit the blog format.) I’ll try not to go to much into the details of git here, listen to the talk instead of you want to know more. The thing I do want to talk about though is git’s tree structures. These tree structures describe the project contents (both directory and file) for one specific commit. These trees are completely immutable (this is ensured by SHA-1 hashes). A new commit creates what amounts to a completely new tree. But usually the changes in each commit are only a fraction of the whole tree. Since a lot of the sub-trees of the original tree have not changed (and are immutable), these can be safely recycled. Only files and folders that have changed need to be added. Note that this means parent nodes of changed nodes have to be created as well, recursively up to the tree root. But even though the tree itself is new (as evidenced by the new root) the additional space requirements are quite low, on the order of approximately O(log n). This makes for extremely compact repository format, and an astonishingly simple one.
Another thing I want to point out that the only element that needs to be mutated is the reference to the root of the tree, everything else is immutable.

Exhibit B: Clojure’s persistent data structures

clojure trees Persistent Trees in git, Clojure and CouchDB

Next up is a picture lifted from Rich Hickey’s “Clojure Concurrency” talk. If you are interested in figuring out ways to deal with the impending multi-core age, I highly recommend you take a look at this talk. The concepts and techniques are not exclusive to Clojure, so even if you are not into Lisp’s you might want to watch it.

The way clojure deals with concurrency (in a nutshell) is that every core data structure (list, trees, hash maps, etc) is immutable. This has one big perceived drawback: Immutable data structures incur a huge overhead because of all the copying involved in creating new instances. But this is only a problem if the immutability is implemented naively. Analysis shows that usually only a part of a data structure is actually modified at any given time. This again allows recycling large parts of the the “old” data structure and only creating part of it new, and just pointing to the bits of the old structure that have not changed. As these are immutable as well, it is completely safe to do so. In the picture the two green boxes are the root nodes of two data structures sharing large parts of their trees.

Since a program that only deals with static, immutable data is pretty boring, there are mechanisms for introducing changes in a program. These mechanisms are called references, and these do mutate. But the runtime ensures that these changes happen in a controlled and well behaved manner, e.g. in the form of Software Transactional Memory (STM) or agents. But everything else is essentially immutable. Note that this has the neat side effect that reading process are inherently thread-safe since nothing can suddenly change while you are looking at it, i.e. no more ConcurrentModificationExceptions.

Exhibit C: CouchDB’s append-only file format

couchdb trees Persistent Trees in git, Clojure and CouchDB

The third picture is from a blog post by Ricky Ho titled “NoSQL patterns”. CouchDB has made quite a few waves recently, so I wanted to see what makes it tick under the hood. In keeping with the Erlang philosophy of robustness and failure tolerance, couchdb uses an append-only file format. This means that all data written is not modified any further and there is thus little chance for corruption. CouchDB uses a B+ Tree to index its entries, so again we have a tree structure. And because the tree entries live inside an append only file, these trees are immutable as well. When making a change to the database, the new data is written along with all changes to nodes in the B+ Tree. Since the B+ Tree is very flat, that means even for databases with millions of entries the number of updated tree nodes is fairly small. The most recent root of the index tree can always be found at the very end of the file, and thus the database file is read from the end backwards. If any kind of failure occurs while writing changes, the software can see that the last entry is corrupt and can seek further back until it finds a “good” entry and proceed from there. Another interesting benefit is that reading processes don’t block writing processes and vice versa. A reading process can just find the latest root and then work from there. Because all references go backward in the file, and everything is immutable there is no contention. Meanwhile a writing process can happily append at the end of the file without disrupting any readers.

Further exhibits

There are probably more examples of this pattern to be found. The oldest reference I could find is “The Design and Implementation of a Log-Structured File System” written by Mendel Rosenblum and John K. Ousterhout in 1991. The most recent file system effort seems to be NILFS. The flash-based SSDs that are currently entering the market are also rumored to use something along these lines internally. I also suspect that a few of the traditional DBMS might use similar data structures under the hood

Shared characteristics

Although all the applications do very different things and come from completely different backgrounds, they share a common data structure underneath. Unfortunately I have not been able to find a well published moniker for this pattern. If anyone does, please let me know.

Update: As FuncFan pointed out in the comments the moniker I was looking for was “Purely functional trees“. Thanks for the quick reply!

Immutable, recycling trees

All three systems have these immutable trees in some way way or form that allow sharing structure. In git these are tree objects, in Clojure they are called persistent data structures, and in CouchDB they are part of the internal index tree.

Mutable references

To allow for changes in an otherwise immutable world, all systems allow for mutable reference constructs. All change is encapsulated in these references, which are called exactly that in both git and Clojure. In CouchDB the story is a little different. Here the “reference” to the latest tree is simply the end of the database file

More common ground

Apart from these “primary” characteristics there are other shared features among these three systems, that are a direct result of the underlying data structure.

Garbage collection

Because data may be shared between data structures, it is often not safe to delete all children when a root node is no longer needed. Instead a garbage collection mechanism is needed to free unused structures. Git does have a special command that does exactly that, Clojure uses the Garbage Collection in the JVM implicitly, and CouchDB offers a compact operation, which removes old versions and tree indices.

Versioning

Because no structure is changed, it is trivially easy to keep old versions around, and since a lot of data is shared it is usually fairly cheap in terms of memory to do so. Versioning is the primary application of git, so no big surprises on that end. In Clojure it is also fairly easy to add versioning for things like an “undo” buffer: Just keep a list of the old objects around. CouchDB also offers some lightweight versioning out-of-box, but it is mostly used for replication. But it should be fairly simple to add more sophisticated versioning features.

Concurrency Safety

The immutability properties of all three system make reasoning about concurrent changes and processes a lot simpler. Reading is trivial because all that is read is set in stone, so to speak. Writing is made easier by the fact that there is only a single point – the reference – that is modified to effect a specific change. This critical point can then be guarded with traditional synchronization primitives, without having to worry about the rest of the data structure.

Conclusion

I have to admit I found it a bit eerie seeing a single pattern pop up in so many different places. It may be just an idea whose time has come (similar to Garbage Collection a few years back). Or it may be just that now we simply have enough memory and computing resources at our disposal, which allows us to never have to delete something from the record, but instead only add incrementally. And oftentimes the history of data is just as interesting as the data itself. The benefits of immutability may also be a good way to tame the concurrency beast in the years to come.

I hope you enjoyed this comparative trip into the internals of these software systems. Again, check out the two talks and the blog post, they are well worth the time.

12 Responses to “Persistent Trees in git, Clojure and CouchDB”

  1. FuncFan says:

    These are examples of a purely functional tree, as first published by Chris Okasaki. See the wikipedia page for both examples, and references to his thesis and resulting book. http://en.wikipedia.org/wiki/Purely_functional#Trees

  2. Dan Fabulich says:

    Only one problem: git trees aren’t immutable! You can use the rebase command to mutate them; most alarmingly, you can delete them. (Deleted commits lurk in git’s “reflog” for a limited period of time; then the garbage collector purges them.)

    It is entirely possible to lose history in this way, or, at a minimum, cause a lot of work for other developers. http://kernel.org/pub/software/scm/git/docs/git-rebase.html See “RECOVERING FROM UPSTREAM REBASE.”

  3. Manuel Woelker says:

    @FuncFan: Thanks for putting a name to the face! I added a small update to the post.

    @Dan Fabulich: Actually the git tree objects themselves are immutable. What rebase does is create a new series of trees, that are changed version of the originals; but the original trees remain untouched. Only the HEAD reference now points to the new history.

    Rebasing being a bit of work is a Good Thing IMHO. For one, it discourages excessive use of the feature. But it also makes it impossible to secretly “spirit away” certain parts of the history. Every downstream repository is going to know that history has been mucked with, and can verify independently what changes were made with relative ease.

    Note that rebasing does have its legitimate uses. One for example are local branches that might need a bit of pruning or polishing. Another use case are situations where certain files have to be removed for legal reasons (e.g. license incompatibilities). C.f. https://bugs.eclipse.org/bugs/show_bug.cgi?id=270852

    But again, the reason that the rebase is so much work is exactly the fact that the trees themselves (and thus their SHA-1 identifiers) are immutable.

  4. I’d also suggest that RESTful architectures have a link to this pattern. The RESTful approach conceptually doesn’t have an update – PUT is equivalent to a replace, which maps on to the append-only approach very nicely.

  5. Robert Virding says:

    Immutable data and trees are old, much older @FuncFan than Chris Okasaki, although he has written a very good description of them. They are a fundamental part of functional and logic programming.

  6. I’ve been following Clojure for a few months and find it’s data structures very intriguing. Thanks for this article… I do hope it’s an idea whose time has come.

  7. Jorge Vargas says:

    Really interesting post. Thank you. I have been puzzling with that patter for some time. Now I have a description and a name!

  8. I believe Apache Cassandra also implements some form of LSM storage.

  9. Frando says:

    These are all directed acylclic graphs (http://en.wikipedia.org/wiki/Directed_acyclic_graph).

  10. D'gou says:

    It is interesting to note that Mercurial, another DVCS, uses the same goodness of “append only” that CouchDB uses. I don’t know the timeline for CouchDB, but my impression is that Mercurial was doing “append only” before then. But “append only” itself is a very old technique.

  11. Daniel Waterworth says:

    Hi,

    I wondered if you’d be interested in aodbm. It’s an append only database in the style of dbm. Here’s it’s url, https://sourceforge.net/projects/aodbm/ .

12 responses so far

Written by . Published in Categories: Planet Eclipse

Author:
Published:
Dec 13th, 2009