In Clean Code, Uncle Bob proposes two rules for good functions: “The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that.” Useful rules, but Clojure requires one more: they’re still not small enough.
Functional languages and immutable data make reasoning easy by making functions simpler. Functions take input, transform it, and return new output. Data passes through functions, flowing rather than mutating. Complicated functions make simple hard, and they can be dangerously easy to write.
Hyperbole aside, there really are two simple rules for functions: they should be small, and do one thing. In his presentation on functions, Uncle Bob describes a simple algorithm for cleaning up crufty functions:
Instead of a contrived wombat example, I’ll use one of my own disgusting old 4Clojure solutions as an example of atrocious code. (But cut me some slack! I was young and naive.)
Here’s the problem:
“Write a function which takes a collection of integers as an argument. Return the count of how many elements are smaller than the sum of their squared component digits. For example: 10 is larger than 1 squared plus 0 squared; whereas 15 is smaller than 1 squared plus 5 squared.”
And here’s my answer (hide the children):
1 2 3 4 5 6 7 8 9 10 |
|
Like nested blocks in other languages, code that sprawls rightward
indicates a problem—and it can happen fast in Clojure.
To start, we’ll extract lt-sqd-components
from the let
binding.
(This is a common, awful 4Clojure hack for defining a named function
inside an anonymous one, though the discerning 4Clojurist uses letfn
).
1 2 3 4 5 6 7 8 9 10 11 |
|
The original function is almost readable, but we can do better. It
looks like I didn’t understand filter
when I wrote this: the extra
map
is redundant since lt-sqd-components
is already a predicate function that
returns true
or false
.
1 2 3 4 5 6 7 8 9 10 11 |
|
This does one thing, so let’s clean it up and move on. It needs a name, and the function we’re filtering against needs a question mark.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
And now the recursive step. Let’s look at the terribly-named
lt-sqd-components
. Each line in its let
binding does something
different. One splits a number into a sequence of its digits:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
One squares every element in a sequence:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
And one takes the sum of the collection.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
One more function to extract: the let
binding should be its own
function. One might argue that this function does one thing—all it
does is check whether a number is less than the sum of its squared components!
But it’s operating on several different levels of abstraction: digits,
a sequence of digits, and their sum. A helpful guideline is limiting
functions to one level of abstraction. In this case, the function
should only know about the sum.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
Despite its dumb name, lt-sqd-components?
is doing one thing. Let’s
clean it up. I prefer “digits” to “components”, and it should use defn
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
On to sum-of-squared-digits
. We can transform the let
binding into a function using the
threading macro (as suggested in the comments on my last post).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
We can do better. I don’t like the intermediate square-all
step,
which should be hidden in sum-of-squares
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Extract the function literal in square-all
. I’ve got a great name
for it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
And there’s only one function left: splitting a number into a sequence of digits. Let’s extract and name the function literal:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
And finally, clean it up by using Integer/parseInt
instead of hacky
subtraction:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
And there it is—clean, readable functions at all levels of abstraction, minimal nesting, and nothing longer than three lines. Starting from the top, low-level functions build into bigger abstractions through combination and composition. Each step is easy to read and comprehend.
As Uncle Bob puts it in Clean Code:
Master programmers think of systems as stories to be told rather than programs to be written. They use the facilities of their chosen programming language to construct a much richer and more expressive language that can be used to tell that story. Part of that domain-specific language is the hierarchy of functions that describe all the actions that take place within that system. In an artful act of recursion, those actions are written to use the very domain-specific language they define to tell their own small part of the story.
Extract. Simplify. Recur. Take the time to consider each line, and clean code comes naturally.
]]>getUserMedia()
and feature
detection to make a scavenger hunt
clue accessible only by holding up a special image in front of the
camera.
But before the giddy magic of HTML5 come terrifying requests from the browser and a terrible user experience. Chrome “wants to use your location.” But for what? And how much does it know? Firefox asks if you’d “like to share your camera.” For how long? There’s no explanation of why or what the site wants to do with the data. The most likely questions for most users are met with only a chance to “Deny” or “Accept” an unknown contract. Perhaps this is a problem to be solved by explanations on the page itself, but the ability to provide a simple message would be a huge improvement, just by offering an explanation.
There’s an existing W3C draft on feature permissions, which punts the idea to web notifications. As users get used to fine-grained permissions on other devices, browsers will need to catch up.
]]>Clean Code is worth the cover price for Chapter 2 alone. Its advice is simple: use meaningful, clear names that reveal intent. This rule probably seems obvious, but the value is in its side effects. Taking the time to scrutinize every name requires the sort of mindfulness and thought that produces clean code. In addition to Uncle Bob’s general guidelines for good names, here are a few Clojure-specific rules on naming.
Clojure’s categorical imperative: act in the Kingdom of
Verbs.
Functions do things, and their names should describe the things they
do. This is usually an easy rule to follow, but functions that build
or return data structures can be tricky. Make-user-map
is better
than user-data
. Render-footer
is better than footer
alone.
Verbs are great, but they’re even greater when they have objects. A name like
remove-temporary-files
is much clearer than clean-up
.
Nouns are also useful inside functions. I find my tolerance for
repetition far lower in Clojure than in other languages: if I use an expression more
than once, I’ll almost always put it in a let
binding and give it a
name. Inside functions that compose multiple transformations on some
data structure, extracting intermediate steps into values in a let
binding can be very helpful.
1 2 3 4 5 |
|
Good nouns are also helpful when destructuring
values, which is awesomely useful but sometimes hard to read. Prefer
putting them in let
bindings to cramming them in the argument list,
except for very simple cases.
1 2 3 4 5 6 7 8 9 |
|
The one first-class exception to verbs everywhere is adjectives for
predicates (functions that return true
or false
, like odd?
and
seq?
). These should always end in question marks and always return
either true
or false
.
1 2 |
|
Clojure has a large set of core functions, and sometimes the clearest name for a function will collide with one of them. Use it anyways! This is why namespaces are useful. Similarly, don’t worry if the best name is a long one–it’s easy to rebind it to a new name when required.
That said, make sure it really is the best name. Long names often
indicate functions that can be split: invert-and-multiply
and
find-and-replace
should probably be split in two. (Hint: and
is a
great clue). If a function’s name collides with a core function or
incorporates a common name, it should act the same way: if table-map
doesn’t apply a function to every cell in a table, it has the wrong name.
The Clojure style guide, Clojuredocs examples and Clojure’s own library coding standards are good resources for picking up common Clojure idioms and vocabulary. Here are a few naming conventions.
In macros, expr
is usually used for a single expression and body
for a longer form.
1 2 |
|
“Collection” is often shortened to coll
:
1 2 |
|
Bundling up extra arguments is almost always done with & more
.
1 2 3 |
|
Like in middle school math, n
is usually an integer, x
and y
are
default numerical inputs, and f
, g
, and h
are often functions.
1 2 3 4 |
|
Dynamic vars wear *earmuffs*
. Try not to use them.
Simple format transformations often use an arrow, e.g.: cljs->clj
,
html->hiccup
, hex->bytes
.
Clojure does a great job separating value, state, and
identity.
Clojure programmers should, too. If a function changes state or has
side effects, its name should reflect it. Functions that mutate
state like swap!
and reset!
end with a bang. Side effects hiding
elsewhere should also be explicit: if format-page
saves a file to
disk, it should be format-and-save-page
(or even better, two
separate functions).
UPDATE: See also the Clojure Style Guide, a concise, comprehensive community-driven document with many more guidelines on clean Clojure.
]]>Not long ago, most software seemed static and untouchable: I treated the libraries and tools I used as artifacts of “real programmers” imbued with a special aura. Slowly, over the last few months, that category has dissolved. Most of the code I touch is no doubt still written by people smarter than me, but suddenly it seems malleable and open for extension.
This, I think, is an underrated benefit of test-driven code. Even when I still feel like I’m faking it, I have tests that tell the truth: the code works as intended. But for learning to test, I don’t think I’d have developed the confidence to jump into an established project and commit without fear. Writing docs and working from open issues are a good start, but to really contribute, the first step is writing tests.
]]>A good user story creates “business value.” But note the “business.” There are many story-like tasks that create lots of value for developers. Refactoring crufty code into something more stable and extensible always extends its lifespan. Speeding up the test suite might mean writing twice as fast. A new design pattern or library behind the scenes can eliminate scores of future headaches. And yet none of these improvements are visible, and none really create business value. They all have second-order effects on business value, usually on scales longer than one week, but the immediate effects are invisible.
It’s easy to spend a whole week on pseudostories with nothing visible to show. (I’ve certainly done it!) And it’s easy to feel resentful when lots of work looks like nothing new at all. But consider a client’s perspective and invisible efficiencies start to look pretty lame: imagine a carpenter showing off her rad new nailgun (twice as fast as a hammer!) instead of the house she’s supposed to be building.
Unseen improvements are hugely helpful, but they don’t count if they aren’t visible. That doesn’t mean they can’t be made to count. It can require creativity, but it’s possible to make unseen improvements more obvious–show, don’t tell, as the writer’s adage goes. Show how using a repository makes it easy to swap in new data sources. Show the new views that reuse carefully extracted components. Show a new feature that Clojurescript made possible. Setting visible goals along the way makes unseen value clear.
]]>As I’ve archived year after year of new messages, I’ve become less comfortable storing my entire indexed email history on someone else’s servers, where they can be scanned and searched at will. And yet in practice , I’ve always traded off privacy against convention and convenience.
In part, this is because it’s long been a discontinuous decision: even a small amount of extra control or privacy required giving up all the modern conveniences of webmail at once. No desktop client came close to the features of Gmail, so I never made the switch. But now that I spend most of my time in a terminal, I’ve finally found a client that provides a pretty good compromise: sup.
Sup is not something I’d set up for my Mom, but Rubyists and Unix geeks will feel right at home. It’s a curses based mail client written in Ruby with excellent full text search out of the box. In addition to offering archiving, labels, and search, it’s built on top of extensible tools like offlineimap, msmtp, and gpg, and scriptable in Ruby.
The official docs are very good (and thus don’t need to be repeated), but these are the sections I found most helpful in order:
For now, sup doesn’t do two-way IMAP syncing, so messages I archive stay on my machine. In my case this is a feature, not a bug: I now have a permanent searchable archive stored locally that won’t change if I delete messages from my Gmail account. I can keep the last few weeks of mail on the Gmail server, accessible from my phone and the web (and seriously, when was the last time you looked at an email more than a month old?), and everything else securely archived on my own drive.
It would be a mistake to consider my sup setup much more private than plain Gmail. My messages still travel across the internet in cleartext and pile up in my correspondents’ inboxes. (Sup integrates nicely with GPG for ad hoc encryption). They’re probably still stored in Google backups and no doubt snarfed and sent off to Bluffdale. I am probably not paranoid enough, but I believe Google’s claims that they really do erase deleted messages, and keeping my own archive raises the cost of compromising or reconstructing my entire history by a little bit, without sacrificing the features I’ve become so dependent upon.
]]>If you’re using Leiningen 2, it’s as easy as:
$ lein new specljs <your project name>
Leiningen will download the template from Clojars automatically.
To start the Speclj autorunner from inside the project directory:
$ lein spec -a
Specljs tests are configured to run whenever the ClojureScript compiles. To watch for changes, rebuild and run tests automatically, start the cljsbuild auto-compiler:
$ lein cljsbuild auto
To run specljs tests once:
$ lein cljsbuild test
If you’re using pre-2.0 Leiningen, you can find the template on Clojars and the source on GitHub.
]]>Every tool in my pentesting kit depends on VirtualBox. Working in virtual machines keeps my security tools separate from my development environment, and allows me to practice attacking hideously vulnerable applications in quarantine. VirtualBox includes excellent network configuration options, including completely virtualized local networks that make it easy to keep things compartmentalized.
Kali Linux, formerly BackTrack, is a specialized Debian distribution that includes hundreds of built in security tools. I can’t begin to imagine the time I might have spent with Homebrew installing and configuring everything included here. The tools included with Kali are many and powerful, and I’ve discovered a new fuzzer, proxy, or scanner to try for every topic in the book.
The authors of WAHH frequently plug their own Burp Suite, a closed-source intercepting proxy that costs $300 per user per year to do anything useful. Zed Attack Proxy, developed by the Open Web Security Project, is completely free, Apache licensed, and just as good an educational and testing tool. (It’s included in Kali, along with the free edition of Burp). Hacking tools are not always the most carefully crafted software, but ZAP is an extremely stable, very pleasant exception. WebScarab, also by OWASP, is another good free alternative.
I remember the joy of my first successful SQL injection like it was last Thursday. (It was last Thursday, but that’s beside the point). The thrill of breaking in with a well placed apostrophe and couple of dashes takes a while to wear off, but diminishing returns are likely to set in after 50 handcrafted variations on the same GET parameter. Fortunately, there’s SQLMap, which almost makes it too easy, automating the entire process of finding and exploiting SQL injection vulnerabilities.
All these tools are no fun without something to (safely, responsibly, legally) attack. Browsing through WAHH, I was excited to see lots of links to online interactive labs illustrating almost every concept. I was less excited to discover that they’re completely proprietary and cost $7/hour. Fortunately, there are plenty of open alternatives:
Metasploitable2 is an intentionally vulnerable virtual machine configured to run several vulnerable web applications on port 80 by default, including Damn Vulnerable Web App and Mutillidae. Before booting it up, please make sure your network settings are configured correctly: it should never ever be exposed to users on your network or the internet.
I’m not reading WAHH to become a professional pentester. I’m doing it to learn how to develop safe web applications, and we write lots of them in Rails. RailsGoat (yet another OWASP project) is a vulnerable Rails application with built-in documentation and examples of the top 10 web vulnerabilities. Best of all, each one includes code samples, which are especially useful for a developer like me trying to avoid writing a goat of my own.
Many of these resources come courtesy of OWASP, the Open Web Application Security Project. In addition to developing lots of free tools, they’re an excellent resource for learning about web security.
]]>core.logic
library for a while,
so I decided to come up with a logic programming solution.
If you haven’t yet checked out core.logic
, there are a growing
number of resources on the web. To get up to speed, I worked through David Nolen’s logic tutorial, the Magical Island of Kanren, the
core.logic
Primer,
and a few chapters of The Reasoned
Schemer.
Hopefully, following along with my test cases will explain the subset
of logic functions in my solution.
I’ll start with a basic speclj test setup:
1 2 3 |
|
For the production code, I’ll start with core.logic
and three functions from the finite domain namespace.
1 2 3 |
|
On to the tests! The declarative aspect of logic programming feels well-suited to a mathy problem like this. Instead of describing how to factorize, I’ll write a few definitions and let the logic engine handle the rest. The strategy I have in mind is the same simple one I learned in 8th grade algebra: start with a composite integer, and decompose it into a factor tree whose leaves are prime factors.
To start, I’ll define “factor pairs”: a vector of two integers that
multiply to another. So, [2 3]
is a factor pair of 6, [1 23]
a
factor pair of 23. Here’s the most general test I came up with:
1 2 3 4 5 6 7 8 9 |
|
There’s a lot of syntax right off the bat, but this test isn’t as confusing as it might look. So far, I’ve found it easiest to understand logic programming as describing a set of constraints on a solution. This test describes the first constraint: factor pairs are vectors of two elements.
Here, _0
and _1
represent reified, unbound logic variables: symbolic
representations of logic variables that can take on any value. (The
numbers at the end indicate that they’re two different variables: _0
and _1
can take on different values). So this
test simply describes the most general constraint: the factor-pairs
function should take something as an argument and return a list of
vectors of two things–any two things!.
Here’s a function that does just that:
1 2 3 4 5 6 7 8 |
|
The run*
function is the, uh, core of core.logic
, used to set up
problems for the logic engine to solve. It returns a list of all the
solutions that match the constraints defined inside. The fresh
form
inside of run*
is
analogous to let
: a way to declare local logic variables. The first
two lines say “Find all
the solutions where pair, factor1, and factor2 meet these
constraints,” and the third describes the only constraint: “A pair is a
vector of two elements, factor1 and factor2”.
Note that I’m ignoring the number
argument! At this point
(factor-pairs 81)
, (factor-pairs 72)
, and (factor-pairs 23)
all
return the same result. For now, calling this function factor-pairs
is a little misleading, since it returns the equivalent of all
possible pairs of two things. But now that the tests pass, we can add
another constraint:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Here, I’ll describe the next constraint: factor pairs should only be
defined between 2 and n. (Yes, a pair like [1 23]
is technically a
pair of factors, but it’s not very useful for my prime factorization purposes).
I may be open to a little TDD legal trouble
with this test update, but I’ve added a couple helper functions to
keep the tests as declarative as possible. Should-all
asserts that a
predicate matches for every element in a collection. In-interval?
tests whether a pair is in the range low
to high
, inclusive.
Hopefully, Two-elements?
explains itself. Since factor-pairs
will
now return a list with many elements, I’ve generalized the original test.
It only takes one line to add the extra constraint:
1 2 3 4 5 6 7 8 9 |
|
The new line declares that factor1
and factor2
must both be in the
finite interval 2-number
. Factor-pairs
is still something of a
misnomer: it now returns the Cartesian product of all numbers
2-number
. But it’s a step closer. I’ll add one more constraint:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
Factor pairs contain two elements between 2 and n that equal n when multiplied. That’s a complete definition of a factor pair, and adding the third constraint completes the function:
1 2 3 4 5 6 7 8 9 10 |
|
Here, eq
converts an arithmetic expression into a constraint, as
long as the logic variables are defined over a finite domain. So the
final constraint simply says number
must equal factor1
times
factor2
. If you’re not convinced by the tests, try it in a REPL:
1 2 3 4 5 6 7 8 |
|
There are some properties here that I’ll use to my advantage. First, by default, factor pairs are ordered by the first factor, ascending. Second, I’ve already created an implicit definition of primes: numbers that return an empty list of factor pairs. I’ll add it as a test for clarity:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Now I’ll move on to decomposition. If you’ve watched a prime factors kata before, you’ve probably seen a few standard tests: start by testing 1, then 2, then 3, and then composites. Here’s where something similar comes in:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Of course, it’s easy to pass this one:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
I’ll make it a little harder with the next bit of the definition. Primes should decompose into themselves:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
I’ll get back to passing with a small tweak. Instead of returning
'(1)
, decompose
should check if a number has any
factor pairs. If not, return the number.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
That’s the easy part, but what about composites? Well, two times a prime should certainly decompose to 2 and the prime:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
Decompose
already includes the base case at the bottom of of the
prime factor tree. If I feed it a number that’s not prime, it should
decompose its factors until it runs into a prime. Concatenating the
results should return a nice list of prime factors:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
And that’s it! Here’s a last set of test cases to confirm that it works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
This isn’t the fastest way to find prime factors, but it’s no worse
than the typical trial division solution to the prime factors kata.
Using core.logic
to declare definitions and constraints on the
solution feels uniquely concise, expressive, and clear, and the
exercise helped me get a more concrete handle on logic programming.
You can find the code from this post (and a few earlier iterations) as
a Gist here.
First, grab VirtualBox. It may be handy for more than testing: in addition to running Internet Explorer, you might want to try a new Linux distro or do some pen testing.
Microsoft recently rolled out free virtual machine images for IE
testing that greatly simplify the setup process. Download and extract the versions you’re interested in.
Some of these are distributed as full images, but most come as several .rar
files and a
self-extracting archive. To extract from the command line:
1 2 |
|
This should create a new .ova
file in the same directory. Open VirtualBox and import it (File > Import Appliance on OS X).
Select the machine you’d like to connect from and choose Settings > Network. Make sure the network adapter is enabled and attached to NAT. (This should be the default setting).
You can access the host machine from inside the VM at 10.0.2.2–e.g., if you access your Rails dev server at localhost:3000, you can connect at 10.0.2.2:3000. Unfortunately, running my Rails dev server from the host machine resulted in occasional redirects to localhost. Fix this by editing the Windows hosts file.
The Windows hosts file is available in
C:\windows\system32\drivers\etc\
. In XP, you’ll be able to edit it
directly with Notepad. On a Vista VM, you’ll need admin priviliges.
Find Notepad in the start menu, right click, and choose “Run as
Administrator.” Then, edit the hosts file to direct localhost to
10.0.2.2, the default VirtualBox address for the host machine.
1
|
|
Run your Rails dev server on the host machine, and connect to localhost:3000 as usual. Congratulations! You’re ready for the joy of Internet Explorer. For bonus points, configure Capybara to run in IE.
]]>For all the details on namespaces, require
, and the ns
macro, see
Colin’s
post,
which is still the authoritative source. But if you’re a Python
programmer looking for a quick reference, here’s the Python-Clojure
Rosetta stone I went looking for the first time I deadpanned into my
Clojure REPL.
1 2 3 |
|
1 2 3 4 |
|
1 2 3 4 |
|
1 2 3 4 5 6 |
|
(Note: This practice is usually as ill-advised in Clojure as it is in Python!)
1 2 3 |
|
1 2 3 4 |
|
1 2 3 4 |
|
1 2 3 4 5 6 |
|
1 2 3 4 |
|
1 2 3 4 5 |
|
1 2 3 4 5 6 7 |
|
1 2 3 4 5 6 7 8 9 |
|
1 2 3 |
|
1 2 3 4 |
|
1 2 3 4 |
|
1 2 3 4 5 6 |
|
1 2 3 |
|
1 2 3 4 |
|
1 2 3 4 |
|
1 2 3 4 5 6 |
|
1 2 3 4 5 |
|
1 2 3 4 5 6 |
|
First, let’s define a few trees up front for testing. The first is the example given in the introduction of the paper. The other two are a little trickier, and the twelve-node tree is not binary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
To calculate the visit order, we can perform a simple breadth-first traversal of the tree. Now that we have a queue, the solution is pretty close to the pseudocode: Start with the root node in the queue, and assign it a number. Then recur with three new parameters: A queue with this node’s children inserted, a list with this node’s number consed on, and an incremented node number. When the queue is empty, we’re done:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
It’s possible to map a visit queue back to a binary tree (check out
Michael’s concise Haskell
solution), but I
wanted a solution that would work for all trees. In the end, I settled for performing a second depth-first
traversal
to label nodes. This walk-map
function works like a recursive map
,
traversing a nested structure and applying a function to every
element that’s not a list:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Here’s a convenience function to store node order in a hash set:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
And at long last, a solution for breadth-first numbering: calculate node order, then map over the tree to apply the labels:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
On the plus side, this solution generalizes to non-binary trees, and is built almost entirely out of Scheme primitives. It’s not as concise or efficient as I’d like, but I’m happy with my lazy lists and functional queue, even if the implementation is a little long. You can find an edited version of my solution here, and all the code from these posts as a Gist here.
]]>Our simple queue consisted of two lists: one for the head of the queue, and a reversed one for the tail:
1
|
|
Our improved queue makes two changes: lazy lists and incremental reversal. It will look like this:
1 2 |
|
This looks a little complicated, but just like the simple queue, it’s also a list of a head and reversed tail, with each side storing the length of the associated list. This equivalent is a little simpler:
1
|
|
To start implementing the improvements, we need to update the selectors to get the lists and lengths from both sides:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Now we can write an updated insert function:
1 2 3 4 5 6 7 8 9 10 11 |
|
Remove is a little more complicated. In the simple queue, we simply swapped and reversed the right side. We want our improved queue to avoid reversing long right side lists. The solution is incremental reversal: rebalance the queue every time an element is removed.
In Okasaki’s implementation,
this is done with functions called make-queue
and rotate
. Below
are my Scheme translations.
Rotate
reverses the right side list and concatenates it to the left.
It’s similar to the simple queue implementation, but it uses lazy
list operators and it’s designed to work incrementally:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Make-queue
implements the incremental reversal logic. Now, we no
longer wait until the head list is empty to swap and reverse. Instead,
we rotate the queue as soon as the tail list contains one more element
than the head. This keeps the queue balanced, and ensures that we
won’t run into an expensive reversal:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
To maintain a balanced queue, we’ll want to call make-queue
on
insertion and removal. Here’s an improved insert, and a new remove:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Finally, let’s add a couple convenience functions to insert and remove multiple items:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Next time, we’ll finally bring everything together to solve the breadth-first numbering problem.
]]>Clojure’s lazy seqs seemed powerful and mysterious until I read
through chapter
3
of SICP. Building a lazy list is based on two simple operations,
delay
and force
:
1 2 3 4 5 6 7 8 9 10 11 |
|
Delay
wraps a form in an anonymous function of no arguments. It can
be stored and passed around like any other list, but won’t perform the
computation “stored” inside until it’s evaluated. Force
is the
opposite of delay, forcing a delayed form to evaluate. From these two
basics, we can build a lazy versions of cons
:
1 2 3 |
|
Racket’s macro system uses syntax-case
macros,
which are a little different from the comma-spliced defmacro
beasts you know
and love from Common Lisp and Clojure. In addition to enforcing good
hygiene,
the syntax-case macro system works by pattern matching against syntax
objects. In lcons
, any form that matches the pattern (lcons item
items)
is mapped to (cons item (delay items))
. In the delay
macro above, anything matching (delay form)
maps to (lambda ()
form)
. We’re still defining the ways we want to
change the syntax of our program, but the transformation is applied
at a different level: to syntax objects instead of raw s-expressions.
With lcons
finished, it’s easy to create lazy lists:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Just as eager lists are composed of nested pairs, lazy lists are composed of nested, delayed pairs:
1 2 3 4 5 6 |
|
Like a normal list, the car
of each pair is the list item, and the
cdr
represents the rest of the list. But it doesn’t return the
next pair until it’s evaluated:
1 2 3 4 5 6 |
|
With this behavior in mind, we can write lazy car
and lazy cdr
:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
A take-n
function is also handy for converting lazy lists back to
eager ones:
1 2 3 4 5 6 7 8 |
|
And that’s it! We’ve written all the basics necessary for lazy lists. (For a few more lazy-fied list operations, see section 3.5.1 of SICP).
Finally, we should make one important optimization. Over the course of
list operations like lcdr
, the same delayed form can be called many
times. If the delayed computation is simple, this won’t be noticeably
inefficient. In our case, we’re just storing values that are
immediately returned (integers in these examples, and eventually some
node representation in our numbering solution). But there’s no
guarantee that delayed computations will be cheap! We could put
functions in a lazy list just as easily:
1 2 |
|
And those functions could require a lot of work:
1 2 3 4 |
|
In practice, we should memoize lazy computations, so subsequent calls look up their previously computed values. It’s an easy fix:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Now that we’ve written lazy lists, we can use them to build an efficient functional queue for the breadth-first numbering problem.
]]>For example, numbering this tree:
1 2 3 4 5 6 7 8 |
|
Should yield this tree:
1 2 3 4 5 6 7 8 |
|
If you’ve ever solved a search problem, this might sound stupid easy. But getting the details of a functional solution right can be a challenge. As Okasaki puts it in the paper:
…I presented the problem to many other functional programmers and was continually amazed at the baroque solutions I received in reply. With only a single exception, everyone who came near a workable answer went in a very different direction from my solution right from the very beginning of the design process. I gradually realized that I was witnessing some sort of mass mental block, a communal blind spot, that was steering programmers away from what seemed to be a very natural solution.
Before you read my baroque solution, you might want to try for yourself. I’ll wait.
Although I love Clojure, using built-in queues and lazy seqs felt like cheating. So I chose to use Racket with Rackunit, and tried to use as many primitives as possible.
Breadth-first traversal is easy with a queue, but an efficient functional queue can be tricky. Consing an element to the front of a Scheme list is cheap, but appending is expensive—it requires “cdring down” over all the elements. One solution (cribbed from Okasaki himself) is to represent a queue as a pair of lists. The list on the left is the head of the queue, so elements can be popped of in O(1) time. The right side represents the rest of the elements in reverse, so elements can be pushed on to the end in constant time. Here are the first steps towards an implementation: an empty queue with left and right selectors.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Inserting an item conses it on to the right-side list:
1 2 3 4 5 6 7 |
|
To dequeue an item, “remove” it from the left side with car
, and
return a new queue, with the cdr
of the left side list:
1 2 3 4 5 6 7 8 |
|
When the left side is out of elements, reverse the right side list,
and swap it with the left. Here’s the buildup to swap-and-reverse-car
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Now we can write a dequeue function that really works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
That’s all it takes to build a simple functional queue. Unfortunately, it’s not very efficient. Reversing a list is the kind of O(n) operation we built our queue to avoid in the first place, but if many more items are inserted than removed, we’ll end up reversing and swapping a lot. We can do better—and I’ll explain how in my next post.
]]>Running huge, complicated queries with a few lines of code was
awesome, but my Hadoop installation made a lot of noise whenever I
tried a query in the REPL using the ?<-
query executor, printing
lots of unwanted log info to
stdout. (I was using Hadoop
installed over homebrew
instead of the
readme
recommendation).
Fortunately, it’s easy to
hush the logger a little by
running queries inside
cascalog.io/with-log-level
.
Here’s a quick two-line
macro that wraps calls to
?<-
in with-log-level
to
quiet down Hadoop:
(require '[cascalog.io :refer [with-log-level]])
(defmacro ?<-- [& forms] `(with-log-level :fatal (?<- ~@forms)))
For future reference, you can find a gist here.
]]>In a perfect TDD world, this question always has a good answer. In my case, it didn’t: this was a complicated line that was twice removed from the test that “created” it. But the path forward was clear right away: write a test for this line, in isolation, the way I should have done from the start.
Good test-driven development requires a lot of restraint and self-discipline, and lines like my faulty byte reader are guaranteed to sneak in if I break one of the laws of TDD—even if it’s only a couple lines of support code that aren’t written just to pass a test, or a test that looks comprehensive really trying to cover too much at once. This question is a great way to hunt them down and fix them: ask every line of code you write to justify its existence.
]]>As a young software engineer, I learned three variables by which to manage projects: speed, quality, and price. The sponsor gets to fix two of these variables and the team gets to estimate the third. If the plan is unacceptable, the negotiating starts. This model doesn’t work well in practice. Time and costs are generally set outside the project. That leaves quality as the only variable you can manipulate. Lowering the quality of your work doesn’t eliminate work, it just shifts it later so delays are not clearly your responsibility. You can create the illusion of progress this way, but you pay in reduced satisfaction and damaged relationships. Satisfaction comes from doing quality work. The variable left out of this model is scope. If we make scope explicit, then we have a safe way to adapt, we have a safe way to negotiate, we have a limit to ridiculous and unnecessary demands.
—Kent Beck, Extreme Programming Explained
I spent all last week working on my web server. It’s a fun project so far, filled with the joy of taking something apart and looking inside to see how it works, but it’s also been a challenge: I had a long checklist of features to implement and only a week to get them all working.
Even though I was still frantically coding on the train to work the day of my demo, I managed to check off all the boxes. Like the 100% test coverage my project reported, 100% box coverage felt great—like the satisfaction of crossing the last item off a long to-do list. But as any test-driven developer knows, even 100% test coverage can’t guarantee that a project will work. This week I learned that box coverage is the same: ticking off features is no guarantee of quality.
Sure, my server met the requirements, but much of the code wasn’t
pretty, and I knew it. And though I was proud of the progress I made and looking
forward to showing off my work, the demo went
off the rails early on, when the server hung and crashed trying to handle
concurrent connections. (If you’re thinking of using Thread.run()
,
you probably want Thread.start()
, by the way). In an instant, all
the little details I’d put effort into—nice looking directory pages,
support for extra headers and obscure content types, clean request
parsing under the hood— were outweighed by one big defect.
The attitude towards quality at 8th Light is clear: quality is non-negotiable, we will never ship software with known defects, and when an unknown one slips by, we’ll fix it for free. That leaves scope as the only free variable in the planning and development process. Although the scope of my web server project was already explicit, I didn’t do a good job negotiating. In retrospect, it’s clear that showing a clean, stable server that only handles GET requests is a greater accomplishment than one with extra bells and whistles that’s prone to random catastrophic failure. But it sure felt good to check off all those boxes.
I’ve learned two lessons over the last week: first, quality and stability matter most. Never sacrifice quality, and never ever tolerate unstable code. Second, renegotiating and giving feedback is part of making scope explicit. Trading off quality for features is guaranteed to be a bad bargain.
]]>I was reminded right away of the diagram in Chapter 2.1 of SICP (the section on building a rational number arithmetic system). It’s not a coincidence: enforcing abstraction between layers is one reason the Internet and TCP/IP are such powerful tools. Any system that does anything interesting is necessarily composed of smaller parts. Whether they’re functions, objects, or sentences, the way they’re put together matters. But equally important is the way they interact, and how they’re separated. (Network protocols give good advice on this, too.)
]]>“Very few writers really know what they are doing until they’ve done it. Nor do they go about their business feeling dewy and thrilled. They do not type a few stiff warm-up sentences and then find themselves bounding along like huskies across the snow. One writer I know tells me that he sits down every morning and says to himself nicely, “It’s not like you don’t have a choice, because you do – you can either type, or kill yourself.” We all often feel like we are pulling teeth, even those writers whose prose ends up being the most natural and fluid. The right words and sentences just do not come pouring out like ticker tape most of the time. […] For me and most of the other writers I know, writing is not rapturous. In fact, the only way I can get anything written at all is to write really, really shitty first drafts.”
–Anne Lamott on shitty first drafts, from Bird By Bird
This is supposed to be a professional blog, but I hope you’ll pardon this bit of language, which comes with some of the best and dearest advice on professionalism I’ve read.
At the end of last week, I moved on to the second project of my apprenticeship: writing a simple HTTP server from scratch. Starting the project was not rapturous. I knew the basics—model the filesystem, connect over a socket, and transfer specially-formatted text back and forth—but had no idea where to start or what test to write first. I read up on sockets, browsed through the HTTP spec, and stared dumbly into IntelliJ for a while, and at long last, I started typing. What Anne Lamott calls “shitty first drafts” the TDD world calls “spikes”—short experiments, sometimes without tests, to figure out what to work on next. A spike is like a “child’s draft,” allowed to run wild and break things on condition that it will be thrown out and replaced with something decent (and well-tested).
Of course, as soon as I had a few lines of code down, the ideas started to flow (I started by parsing headers, by the way), and within an hour I had a feature-complete HTCPCP server (HTTP is still a work in progress). Another reminder that programming is craft, and writing code really is a creative act.
]]>