My mentor Colin wrote one of the most
popular posts on the 8th Light blog, on
the somewhat complex process of requiring Clojure
libs.
It’s not a stretch to give it some of the credit for my interest in
Clojure: I came close to giving up in frustration the first time I
tried to import a library in a Clojure REPL. It was so easy in
Python, and there I was, suddenly dealing with
macros, keywords, quotes, and–heaven forfend!–all those parentheses.
For all the details on namespaces, require, and the ns macro, see
Colin’s
post,
which is still the authoritative source. But if you’re a Python
programmer looking for a quick reference, here’s the Python-Clojure
Rosetta stone I went looking for the first time I deadpanned into my
Clojure REPL.
Now that we’ve built an efficient functional
queue,
we can finally put it to work in a breadth-first numbering solution.
First, let’s define a few trees up front for testing. The first is the
example given in the introduction of the
paper.
The other two are a little trickier, and the twelve-node tree is not binary.
To calculate the visit order, we can perform a simple breadth-first
traversal of the
tree. Now that we have a queue, the solution is pretty close to the pseudocode: Start with the
root node in the queue, and assign it a number. Then recur with three
new parameters: A queue with this node’s children inserted, a list
with this node’s number consed on, and an incremented node number.
When the queue is empty, we’re done:
1234567891011121314151617181920212223242526
(define visit-order(lambda (tree)(define bfs-iter(lambda (queuevisitedn)(if (null? (remqueue))visited(let ((node(car (remqueue)))(new-q(car (cdr (remqueue)))))(if (equal? "leaf"node)(bfs-iternew-qvisitedn)(bfs-iter(ins-items(cdr node)new-q)(cons (cons (car node)n)visited)(+ 1n)))))))(bfs-iter(instreeempty-q)'()1)))(check-equal?(visit-orderexample-tree)'(("C".4)("D".3)("B".2)("A".1))"Visit-order searches a tree breadth-first and returns a list of nodes and their numbers.")(check-equal?(visit-orderfive-nodes)'(("E".5)("D".4)("C".3)("B".2)("A".1)))(check-equal?(visit-ordertwelve-nodes)'(("L".12)("K".11)("J".10)("I".9)("H".8)("G".7)("F".6)("E".5)("D".4)("C".3)("B".2)("A".1))"Visit-order works on non-binary trees.")
It’s possible to map a visit queue back to a binary tree (check out
Michael’s concise Haskell
solution), but I
wanted a solution that would work for all trees. In the end, I settled for performing a second depth-first
traversal
to label nodes. This walk-map function works like a recursive map,
traversing a nested structure and applying a function to every
element that’s not a list:
On the plus side, this solution generalizes to non-binary trees, and
is built almost entirely out of Scheme primitives. It’s not as concise
or efficient as I’d like, but I’m happy with my lazy lists and
functional queue, even if the implementation is a little long. You can
find an edited version of my solution
here, and all the code
from these posts as a Gist here.
On the way to a breadth-first
numbering
solution, I’ve built a simple functional queue and created lazy
lists
from Scheme primitives (and a couple handy macros). In this post,
I’ll bring the pieces together to create an improved queue. (And yes,
I promise I’ll actually start numbering trees sometime soon).
Our simple queue consisted of two lists: one for the head of the
queue, and a reversed one for the tail:
1
'((123)(54))
Our improved queue makes two changes: lazy lists and incremental
reversal. It will look like this:
This looks a little complicated, but just like the simple queue, it’s also a list of a head and
reversed tail, with each side storing the length of the associated
list. This equivalent is a little simpler:
1
'((llist'(123)).3)(llist'(54)).2))
To start implementing the improvements, we need to update the
selectors to get the lists and lengths from both sides:
1234567891011121314151617181920212223242526272829
(define empty-q(list (cons '()0)(cons '()0)))(define five-items(list (cons (llist'(12))2)(cons (llist'(345))3)))(define lhs-len(lambda (queue)(cdr (left-sidequeue))))(check-equal?(lhs-lenfive-items)2"Our lazy queue stores the length of the lists on each side.")(define rhs-len(lambda (queue)(cdr (right-sidequeue))))(check-equal?(rhs-lenfive-items)3"Our lazy queue stores the length of the lists on each side.")(define lhs-list(lambda (queue)(car (left-sidequeue))))(check-equal?(lcar(lhs-listfive-items))(lcar(llist'(12))))(define rhs-list(lambda (queue)(car (right-sidequeue))))(check-equal?(lcar(rhs-listfive-items))(lcar(llist'(345))))
Now we can write an updated insert function:
1234567891011
(define ins(lambda (itemqueue)(list (left-sidequeue)(cons (lconsitem(rhs-listqueue))(+ 1(rhs-lenqueue))))))(let ((three-items(ins3(ins2(ins1empty-q))))(six-items(ins6(ins5(ins4(ins3(ins2(ins1empty-q))))))))(check-equal?(take-n3(rhs-listthree-items))'(321))(check-equal?(take-n3(rhs-listsix-items))'(654)"Ins adds elements to the right side."))
Remove is a little more complicated. In the simple queue, we simply
swapped and reversed the right side. We want our improved queue to
avoid reversing long right side lists. The solution is incremental
reversal: rebalance the queue every time an element is removed.
In Okasaki’s implementation,
this is done with functions called make-queue and rotate. Below
are my Scheme translations.
Rotate reverses the right side list and concatenates it to the left.
It’s similar to the simple queue implementation, but it uses lazy
list operators and it’s designed to work incrementally:
1234567891011121314
(define rotate(lambda (leftright)(define rotate-recur(lambda (leftrightaccumulator)(if (null? left)(lcons(lcarright)accumulator)(lcons(lcarleft)(rotate-recur(lcdrleft)(lcdrright)(lcons(lcarright)accumulator))))))(rotate-recurleftright'())))(let ((rotated(rotate(lhs-listfive-items)(rhs-listfive-items))))(check-equal?(take-n5rotated)'(12543)"Rotate reverses the right side list and concatenates it to the left."))
Make-queue implements the incremental reversal logic. Now, we no
longer wait until the head list is empty to swap and reverse. Instead,
we rotate the queue as soon as the tail list contains one more element
than the head. This keeps the queue balanced, and ensures that we
won’t run into an expensive reversal:
1234567891011121314
(define make-queue(lambda (leftright)(if (<= (cdr right)(cdr left))(list leftright)(list (cons (rotate(car left)(car right))(+ (cdr left)(cdr right)))(cons '()0)))))(let ((rebalanced(make-queue(left-sidefive-items)(right-sidefive-items))))(check-equal?(take-n5(lhs-listrebalanced))'(12543))(check-equal?(rhs-listrebalanced)'()"Make-queue rebalances the queue when the right side is longer than the left."))
To maintain a balanced queue, we’ll want to call make-queue on
insertion and removal. Here’s an improved insert, and a new remove:
1234567891011121314151617181920212223242526272829
(define ins(lambda (itemqueue)(make-queue(left-sidequeue)(cons (lconsitem(rhs-listqueue))(+ 1(rhs-lenqueue))))))(let ((three-items(ins3(ins2(ins1empty-q))))(six-items(ins6(ins5(ins4(ins3(ins2(ins1empty-q))))))))(check-equal?(take-n3(lhs-listthree-items))'(123))(check-equal?(take-n3(lhs-listsix-items))'(123))(check-equal?(take-n3(rhs-listsix-items))'(654)"Ins adds elements to the right side and rebalances if it's longer than the left."))(define rem(lambda (queue)(if (and (null? (lhs-listqueue))(null? (rhs-listqueue)))'()(list (lcar(lhs-listqueue))(make-queue(cons (lcdr(car (left-sidequeue)))(- (lhs-lenqueue)1))(right-sidequeue))))))(let ((removed(rem(ins4(ins3(ins2(ins1empty-q)))))))(check-equal?(car removed)1)(check-equal?(take-n2(lhs-list(cadr removed)))'(23))(check-equal?(take-n1(rhs-list(cadr removed)))'(4)"Rem returns a pair: the element removed from the queue and the new queue."))
Finally, let’s add a couple convenience functions to insert and remove
multiple items:
1234567891011121314151617181920212223242526
(define ins-items(lambda (itemsqueue)(if (null? items)queue(ins-items(cdr items)(ins(car items)queue)))))(let ((seven-items(ins-items'(1234567)empty-q)))(check-equal?(take-n7(lhs-listseven-items))'(1234567)"Ins-items adds multiple items to the queue."))(define rem-n(lambda (nqueue)(define rem-n-iter(lambda (nqueueitems)(if (= 0n)(cons (reverse items)queue)(rem-n-iter(- n1)(car (cdr (remqueue)))(cons (car (remqueue))items)))))(rem-n-iternqueue'())))(let ((remove-four(rem-n4(ins-items'(1234567)empty-q))))(check-equal?(car remove-four)'(1234))(check-equal?(+ (lhs-len(cdr remove-four))(rhs-len(cdr remove-four)))3"Rem-n returns a list of removed items and the new queue."))
Next time, we’ll finally bring everything together to solve the
breadth-first numbering problem.
In my last
post,
I wrote a simple functional queue as a stepping stone towards a
breadth-first-numbering
solution. By using an ordered and reversed list to represent the front
and back of the queue, it’s cheap and easy to enqueue and dequeue
items. A simple queue is fine for this toy example, but it’s
inefficient, since it requires reversing the tail list whenever the
head list is empty. We can do better with two improvements: lazy lists
and incremental reversal.
Clojure’s lazy seqs seemed powerful and mysterious until I read
through chapter
3
of SICP. Building a lazy list is based on two simple operations,
delay and force:
Delay wraps a form in an anonymous function of no arguments. It can
be stored and passed around like any other list, but won’t perform the
computation “stored” inside until it’s evaluated. Force is the
opposite of delay, forcing a delayed form to evaluate. From these two
basics, we can build a lazy versions of cons:
Racket’s macro system uses syntax-case
macros,
which are a little different from the comma-spliced defmacro beasts you know
and love from Common Lisp and Clojure. In addition to enforcing good
hygiene,
the syntax-case macro system works by pattern matching against syntax
objects. In lcons, any form that matches the pattern (lcons item
items) is mapped to (cons item (delay items)). In the delay
macro above, anything matching (delay form) maps to (lambda ()
form). We’re still defining the ways we want to
change the syntax of our program, but the transformation is applied
at a different level: to syntax objects instead of raw s-expressions.
With lcons finished, it’s easy to create lazy lists:
1234567891011121314151617181920212223
(define llist(lambda (items)(if (null? items)'()(lcons(car items)(llist(cdr items))))))(define lazy(llist'(12345)))(check-predpair?lazy"A lazy list is a pair: the head of the list and a delayed function.")(check-equal?(car lazy)1"A lazy list stores its head as the first item of a pair.")(check-predprocedure?(cdr lazy)"A lazy list stores a delayed function as the second item of a pair.")(check-equal?(car ((cdr lazy)))2"Evaluating the delayed function returns the next item in the list.")
Just as eager lists are composed of nested pairs, lazy lists are
composed of nested, delayed pairs:
Like a normal list, the car of each pair is the list item, and the
cdr represents the rest of the list. But it doesn’t return the
next pair until it’s evaluated:
With this behavior in mind, we can write lazy car and lazy cdr:
12345678910111213
(define lcar(lambda (llist)(car llist)))(check-equal?(lcarlazy)1"Lazy-car is just like regular car!")(define lcdr(lambda (llist)(force (cdr llist))))(check-equal?(car (lcdrlazy))2"Lazy-cdr forces evaluation of the next list element.")
A take-n function is also handy for converting lazy lists back to
eager ones:
12345678
(define take-n(lambda (nlazy-list)(if (= 0n)'()(cons (lcarlazy-list)(take-n(- n1)(lcdrlazy-list))))))(check-equal?(take-n4(llist'(1234567)))'(1234)"Take-n returns the first n elements of a lazy list.")
And that’s it! We’ve written all the basics necessary for lazy lists.
(For a few more lazy-fied list operations, see section 3.5.1
of SICP).
Finally, we should make one important optimization. Over the course of
list operations like lcdr, the same delayed form can be called many
times. If the delayed computation is simple, this won’t be noticeably
inefficient. In our case, we’re just storing values that are
immediately returned (integers in these examples, and eventually some
node representation in our numbering solution). But there’s no
guarantee that delayed computations will be cheap! We could put
functions in a lazy list just as easily:
>(lcons(read-book"Finnegans Wake")(lcons(read-book"In Search of Lost Time")(lcons(read-book"Gravity's Rainbow"))))'("riverrun, past Eve and Adam's...".#<procedure:...read.scm:10:5>)
In practice, we should memoize lazy computations, so subsequent calls
look up their previously computed values. It’s an easy fix:
I spent some time this weekend (okay fine, most of Saturday and
Sunday afternoon) on an exercise Michael
Baker shared on our “geeks” mailing
list. The problem is a functional
pearl
from Chris Okasaki: given a binary tree, reproduce a structurally
identical tree with its nodes numbered in breadth-first order.
For example, numbering this tree:
12345678
_ a
/ \
/ \
b d
/ \ / \
. c . .
/ \
. .
Should yield this tree:
12345678
_ 1
/ \
/ \
2 3
/ \ / \
. 4 . .
/ \
. .
If you’ve ever solved a search problem, this might sound stupid easy.
But getting the details of a functional solution right can be a challenge. As Okasaki
puts it in the paper:
…I presented the problem to many other
functional programmers and was continually amazed at the
baroque solutions I received in reply. With only a single
exception, everyone who came near a workable answer went
in a very different direction from my solution right from the
very beginning of the design process. I gradually realized
that I was witnessing some sort of mass mental block, a
communal blind spot, that was steering programmers away
from what seemed to be a very natural solution.
Before you read my baroque solution, you might want to try for
yourself. I’ll wait.
Although I love Clojure, using built-in queues
and lazy seqs felt like cheating. So I chose
to use Racket with
Rackunit,
and tried to use as many primitives as possible.
Breadth-first traversal is easy with a queue, but an efficient
functional queue can be tricky. Consing an element to the front of a
Scheme list is cheap, but appending is expensive—it requires “cdring
down” over all the elements. One solution (cribbed from Okasaki
himself)
is to represent a queue as a pair of lists. The list on the left is
the head of the queue, so elements can be popped of in O(1) time. The
right side represents the rest of the elements in reverse, so
elements can be pushed on to the end in constant time. Here are the first
steps towards an implementation: an empty queue with left and right selectors.
12345678910111213141516
(define simple-q'((123)(654)))(define empty-queue'(()()))(check-equal?empty-queue'(()())"An empty queue is a list containing two empty lists.")(define right-side(lambda (queue)(car (cdr queue))))(check-equal?(right-sidesimple-q)'(654)"The right side of a queue is the second list.")(define left-side(lambda (queue)(car queue)))(check-equal?(left-sidesimple-q)'(123)"The left side of a queue is the first list.")
Inserting an item conses it on to the right-side list:
1234567
(define insert(lambda (itemqueue)(list (left-sidequeue)(cons item(right-sidequeue)))))(check-equal?(insert7simple-q)'((123)(7654))"Inserting an element adds it to the beginning of the right side list.")
To dequeue an item, “remove” it from the left side with car, and
return a new queue, with the cdr of the left side list:
12345678
(define remove(lambda (queue)(list (car (left-sidequeue))(list (cdr (left-sidequeue))(right-sidequeue)))))(check-equal?(removesimple-q)'(1((23)(654)))"Removing an element returns a pair: the removed element and the new queue.")
When the left side is out of elements, reverse the right side list,
and swap it with the left. Here’s the buildup to swap-and-reverse-car:
12345678910111213141516171819202122232425262728
(define swap(lambda (queue)(list (right-sidequeue)(left-sidequeue))))(check-equal?(swapsimple-q)'((654)(123))"The right side and left side can be swapped.")(define reverse(lambda (items)(if (null? (cdr items))items(append (reverse (cdr items))(list (car items))))))(check-equal?(reverse (right-sidesimple-q))(list 456)"A list's elements can be reversed.")(define reverse-car(lambda (items)(cons (reverse (car items))(cdr items))))(check-equal?(reverse-car'((12)(34)))'((21)(34))"The first item in a list can be reversed.")(define swap-and-reverse-car(lambda (queue)(reverse-car(swapqueue))))(check-equal?(swap-and-reverse-car'(()(654)))'((456)())"Swap and reverse-car can be composed to swap sides, then reverse the left.")
Now we can write a dequeue function that really works:
1234567891011121314
(define remove(lambda (queue)(if (null? (left-sidequeue))(remove(swap-and-reverse-carqueue))(list (car (left-sidequeue))(list (cdr (left-sidequeue))(right-sidequeue))))))(check-equal?(remove'(()(654)))'(4((56)()))"To remove an element when the left side is empty, swap and reverse, then try again.")(check-equal?(removesimple-q)'(1((23)(654)))"Removing an element returns a pair: the removed element and the new queue.")
That’s all it takes to build a simple functional queue. Unfortunately,
it’s not very efficient. Reversing a list is the kind of O(n)
operation we built our queue to avoid in the first place, but if many
more items are inserted than removed, we’ll end up reversing and
swapping a lot. We can do better—and I’ll explain how in my next post.
Tonight I attended a meetup on Cascalog, a clojure DSL built on top of the Hadoop map-reduce framework. (Slides here and code here if you’d like to check it out yourself).
Running huge, complicated queries with a few lines of code was
awesome, but my Hadoop installation made a lot of noise whenever I
tried a query in the REPL using the ?<- query executor, printing
lots of unwanted log info to
stdout. (I was using Hadoop
installed over homebrew
instead of the
readme
recommendation).
Fortunately, it’s easy to
hush the logger a little by
running queries inside
cascalog.io/with-log-level.
Here’s a quick two-line
macro that wraps calls to
?<- in with-log-level to
quiet down Hadoop:
I was puzzled. Reading raw POST requests across a network socket
returned headers, but never body content. When I mentioned it to Colin
and showed him the method I suspected was misbehaving, he asked me a
question: “Why does that exist? Which test brought that line of code
into being?”
In a perfect TDD world, this question always has a good answer. In my
case, it didn’t: this was a complicated line that was
twice removed from the test that “created” it. But the path forward
was clear right away: write a test for this line, in isolation, the
way I should have done from the start.
Good test-driven development requires a lot of restraint and self-discipline, and lines
like my faulty byte reader are guaranteed to sneak in if I break one
of the laws of TDD—even if it’s only a couple lines of
support code that aren’t written just to pass a test, or a test that
looks comprehensive really trying to cover too much at once. This
question is a great way to hunt them down and fix them: ask every line
of code you write to justify its existence.
As a young software engineer, I learned three variables by which to
manage projects: speed, quality, and price. The sponsor gets to fix
two of these variables and the team gets to estimate the third. If the
plan is unacceptable, the negotiating starts.
This model doesn’t work well in practice. Time and costs are
generally set outside the project. That leaves quality as the only
variable you can manipulate. Lowering the quality of your work doesn’t
eliminate work, it just shifts it later so delays are not clearly your
responsibility. You can create the illusion of progress this way, but
you pay in reduced satisfaction and damaged relationships.
Satisfaction comes from doing quality work.
The variable left out of this model is scope. If we make scope explicit, then we have a safe
way to adapt, we have a safe way to negotiate, we have a limit to
ridiculous and unnecessary demands.
I spent all last week working on my web
server. It’s a fun project
so far, filled with the joy of taking something apart and looking
inside to see how it works, but it’s also been a challenge: I had a
long checklist of features to implement and only a week to get them
all working.
Even though I was still frantically coding on the train to work the
day of my demo, I managed to check off all the boxes. Like the 100% test
coverage my project reported, 100% box coverage felt great—like the
satisfaction of crossing the last item off a long to-do list. But as
any test-driven developer knows, even 100% test coverage can’t
guarantee that a project will work. This week I learned that box coverage is the same: ticking
off features is no guarantee of quality.
Sure, my server met the requirements, but much of the code wasn’t
pretty, and I knew it. And though I was proud of the progress I made and looking
forward to showing off my work, the demo went
off the rails early on, when the server hung and crashed trying to handle
concurrent connections. (If you’re thinking of using Thread.run(),
you probably want Thread.start(), by the way). In an instant, all
the little details I’d put effort into—nice looking directory pages,
support for extra headers and obscure content types, clean request
parsing under the hood— were outweighed by one big defect.
The attitude towards quality at 8th Light is clear: quality is
non-negotiable, we will never ship software with known defects, and
when an unknown one slips by, we’ll fix it for free. That leaves scope
as the only free variable in the planning and development process.
Although the scope of my web server project was already explicit, I
didn’t do a good job negotiating. In retrospect, it’s clear that
showing a clean, stable server that only handles GET requests is a
greater accomplishment than one with extra bells and whistles that’s
prone to random catastrophic failure. But it sure felt good to check
off all those boxes.
I’ve learned two lessons over the last week: first, quality and stability matter most. Never sacrifice
quality, and never ever tolerate unstable code. Second, renegotiating
and giving feedback is part of making scope explicit. Trading off
quality for features is guaranteed to be a bad bargain.
Working on my HTTP server has required me to learn a little more about
the technology that makes the web work. At the outset, I had only the vaguest notion
of what a socket was, but once I saw a diagram of the TCP/IP layer
model,
it was clear—a socket is an abstraction. (More on the model
here).
I was reminded right away of the diagram in Chapter
2.1
of SICP (the section on building
a rational number arithmetic system). It’s not a coincidence:
enforcing abstraction between layers is one reason the Internet and
TCP/IP are such powerful tools. Any system that does anything
interesting is necessarily composed of smaller
parts. Whether they’re functions, objects, or
sentences, the way
they’re put together
matters.
But equally important is the way they interact, and how they’re separated.
(Network protocols give good
advice on this,
too.)
“Very few writers really know what they are doing until they’ve done it. Nor do
they go about their business feeling dewy and thrilled. They do not type a few stiff
warm-up sentences and then find themselves bounding along like huskies across the
snow. One writer I know tells me that he sits down every morning and says to
himself nicely, “It’s not like you don’t have a choice, because you do – you can
either type, or kill yourself.” We all often feel like we are pulling teeth, even those
writers whose prose ends up being the most natural and fluid. The right words and
sentences just do not come pouring out like ticker tape most of the time. […] For me and
most of the other writers I know, writing is not rapturous. In fact, the only way I can get
anything written at all is to write really, really shitty first drafts.”
This is supposed to be a professional blog, but I hope you’ll pardon
this bit of language, which comes with some of the best and dearest
advice on professionalism I’ve read.
At the end of last week, I moved on to the second project of my
apprenticeship: writing a simple HTTP server from scratch.
Starting the project was not rapturous. I knew the basics—model the
filesystem, connect over a socket, and transfer specially-formatted
text back and forth—but had no idea where to start or what test to
write first. I read up on sockets, browsed through the HTTP spec,
and stared dumbly into IntelliJ for a while, and at long last, I
started typing. What Anne Lamott calls “shitty first drafts” the TDD
world calls
“spikes”—short
experiments, sometimes without tests, to figure out what to work on
next. A spike is like a “child’s draft,” allowed to run wild and
break things on condition that it will be thrown out and replaced
with something decent (and well-tested).
Of course, as soon as I had a few lines of code down, the ideas
started to flow (I started by parsing headers, by the way), and
within an hour I had a feature-complete HTCPCP server (HTTP is still a work in progress). Another reminder that programming is craft, and writing code really is a creative act.