For the most part, test-driven development has been a breeze so far. In addition to enforcing good habits, watching red tests turn green provides an extremely satisfying dopamine kick every few minutes, all day. Testing simple objects and actions in my Tic Tac Toe game—things like the board, player behavior, and the Minimax algorithm—was straightforward. But testing the view and controller classes that glue them together required a little more thought.

Writing tests required interrupting the game loop to check on the behavior of the view and controller objects. To start, I added optional flag arguments to many methods that would break the game loop so I could make assertions about game state. I quickly came to realize that this was a bad solution, and I cringed the next day when a chapter of Clean Code described boolean flags as one of the most rancid code smells around.

I came across the concept of test doubles and mock objects, and got the idea right away: create fake objects with the same methods as real ones, override their behavior, and use them as substitutes for their more complicated counterparts in unit tests.

Or at least, I thought I got the idea. With my tests passing and my game working, I felt pretty good about my project. But wiring up a code coverage tool showed that my view and controller classes were only half covered by my tests. What went wrong? As it turned out, I was testing my mocks! Here’s the pattern I was following:

  • Create a subclass of the object you want to test.
  • Override or stub out methods to return predetermined output.
  • Write assertions against the behavior of the mock objects.

This meant, of course, that I was never actually testing the real objects, but only the fake ones, as revealed by the code coverage data. Worse, all the tests that I thought showed my code was working were essentialy tautologies. This probably appears obvious to experienced TDD practitioners, but it was surprisingly easy to fall into this antipattern. For the record, here’s how you should really use a mock object:

  • Create mocks of the objects that interact with the one you want to test.
  • Override or stub out methods to return predetermined output.
  • Write assertions against the behavior of the real object interacting with the test doubles.

In retrospect, this makes perfect sense. But I’ll be going over all my tests with a careful eye tomorrow. Sometimes the green light isn’t what it seems.

Of all the things John von Neumann invented (including the Minimax theorem behind my Tic Tac Toe AI, and the architecture of the computer it runs on), Monte Carlo simulation is one of my favorites. Unlike some of his other breakthroughs, the idea behind Monte Carlo simulation is simple: use probability and computation to estimate solutions to hard problems.

Think of the craziest integral you’ve ever solved, and the sheaves of notebook paper spent working out the area under that ridiculous curve. The Monte Carlo approach is a clever hack: tack a graph of the function to a dartboard, and start throwing darts at random. As more and more darts hit the board, the ratio of darts that land under the curve to total darts thrown will approximate the proportion of the dartboard under the curve. Multiply this ratio by the area of the dartboard, and you’ve computed the integral. As a student whose favorite fourth grade problem solving strategy was “guess and check,” and whose favorite tenth grade problem solving strategy was “plug it into your TI-83,” the idea has a lot of appeal.

Wikipedia has a great example of estimating the value of Pi with a Monte Carlo simulation. I wrote a similar simulation in Python a couple months ago, but wanted to check out Clojure’s Incanter library for statistical computing and try my hand at a TDD solution.

To start, you’ll want the speclj test framework for Clojure, which uses Rspec-like syntax and provides a helpful autorunner. Create a new Leiningen project and add it as a dev dependency and plugin in project.clj. Make sure you set up your spec directory as described in the documentation. You’ll also want to add Incanter to your project dependencies. Here’s my final project.clj:

project.clj
1
2
3
4
5
6
7
(defproject montecarlopi "0.1.0"
  :dependencies [[org.clojure/clojure "1.5.0"]
                 [incanter "1.5.0-SNAPSHOT"]]
  :main montecarlopi.core
  :profiles {:dev {:dependencies [[speclj "2.5.0"]]}}
  :plugins [[speclj "2.5.0"]]
  :test-paths ["spec/"])

To start, we’ll need a way to tell whether a given point is inside the circle. Here’s a simple test:

core_spec.clj
1
2
3
4
5
6
7
8
9
10
(ns montecarlopi.core-spec
  (:require [speclj.core :refer :all]
            [montecarlopi.core :refer :all]))

(describe "in-circle?"
  (it "should return true for points inside the circle."
      (should (in-circle? [0 0.1])))

  (it "should return false for points outside the circle."
      (should-not (in-circle? [0.8 0.9]))))

Fire up the testrunner with lein spec -a, and you’ll see an exception, since the function doesn’t exist. Time to add it to core.clj:

core.clj
1
2
3
4
5
6
7
8
9
10
(ns montecarlopi.core
  (require [incanter.core   :refer [sq]]

(defn in-circle? [point]
  (let [[x y] point]
    (if (<= (+ (sq x)
               (sq y))
            1.0)
      true
      false)))

The let binding pulls x and y coordinates out of a vector representing a point. If x squared plus y squared are less than 1, the point is inside the circle. Save and watch the autorunner turn green.

Next, we need a way to generate points at random, with x and y values between 0 and 1. Here’s a test:

core_spec.clj
1
2
3
4
5
6
7
8
(describe "generate-random-point"
  (it "should return a point with x and y values between 0 and 1."
      (let [[x y] (generate-random-point)]
        (should (<= x 1))
        (should (>= x 0))

        (should (<= y 1))
        (should (>= y 0)))))

And a dead-simple implementation (clojure.core/rand conveniently generates random floats between 0 and 1).:

core.clj
1
2
(defn generate-random-point []
  [(rand) (rand)])

We need a lot of random points, so we’d better add a function to generate them. Here’s a test and a function that passes:

core_spec.clj
1
2
3
(describe "generate-random-points"
  (it "should return the specified number of random points."
    (should= 100 (count (generate-random-points 100)))))
core.clj
1
2
(defn generate-random-points [n]
  (take n (repeatedly generate-random-point)))

Once we’ve generated lots of points, we’ll want to know how many lie inside the circle. This test is a little more complicated, since it requires some mock points to test against:

core_spec.clj
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(describe "points-in-circle"
  (it "should return only points inside the circle."
    (let [inside  [[0.0 0.0]
                   [0.2 0.3]
                   [0.1 0.8]
                   [0.0 1.0]
                   [1.0 0.0]]

          outside [[1.0 0.2]
                   [0.8 0.8]
                   [0.5 0.9]
                   [0.8 0.7]
                   [0.9 0.9]]
          points  (concat inside outside)]
      (should= 5 (count (points-in-circle points)))
      (should= inside (points-in-circle points))
      (should-not= outside (points-in-circle points)))))

But a passing solution is as easy as using in-circle? as a filter:

core.clj
1
2
(defn points-in-circle [points]
  (filter in-circle? points))

In order to plot the points, we’ll need to sort them into groups. A map should do the trick. Let’s also take this chance to pull inside, outside, and points out of the let binding and define them as vars so we can reuse them. Here’s the result after a quick refactor:

core_spec.clj
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
(def inside  [[0.0 0.0]
              [0.2 0.3]
              [0.1 0.8]
              [0.0 1.0]
              [1.0 0.0]])

(def outside [[1.0 0.2]
              [0.8 0.8]
              [0.5 0.9]
              [0.8 0.7]
              [0.9 0.9]])

(def points (concat inside outside))


(describe "points-in-circle"
  (it "should return only points inside the circle."
    (should= 5 (count (points-in-circle points)))
    (should= inside (points-in-circle points))
    (should-not= outside (points-in-circle points))))

(describe "sort-points"
  (it "should return a map of correctly sorted points."
      (should= {:inside  inside
                :outside outside}
               (sort-points points))))

This code passes, but it’s not as clear as it could be:

core.clj
1
2
3
(defn sort-points [points]
  {:inside  (filter in-circle? points)
   :outside (filter #(not (in-circle %)) points)})

I’d like to rename the inline function outside-circle?. So let’s add a test:

core_spec.clj
1
2
3
(describe "outside-circle?"
  (it "should return true for points outside the circle."
      (should (outside-circle? [0.9 0.7]))))

Write the function:

core.clj
1
2
(defn outside-circle? [point]
  (not (in-circle? point)))

And refactor sort-points once speclj gives us the green light:

core.clj
1
2
3
(defn sort-points [points]
  {:inside  (filter in-circle? points)
   :outside (filter outside-circle? points)})

It shouldn’t be hard to convert sorted points into a ratio. Here’s a test and solution:

core_spec.clj
1
2
3
(describe "point-ratio"
  (it "should return the correct ratio of inside to outside points"
      (should= 0.5 (point-ratio points))))

Make sure you return a float instead of a rational:

core.clj
1
2
3
(defn point-ratio [points]
  (float (/ (count (points-in-circle points))
            (count points))))

And now we’re just a step away from estimating Pi. We’re considering a quarter circle inside a unit square. The area of the square is 1, and the quarter circle 1/4 * Pi. So multiplying by four gives us our estimate:

core_spec.clj
1
2
3
4
5
6
7
8
(describe "estimate-pi"
 (it "should return four times the ratio of inside to outside points."
      (should= 2.0 (estimate-pi points)))

  (it "should return something close to pi when given a lot of points."
      (let  [estimate (estimate-pi (generate-random-points 70000))]
        (should (> estimate 3.13))
        (should (< estimate 3.15)))))

The second part of this test is a little tricky. Our estimate is probabilistic, but with enough points it should almost always generate an estimate within range. Passing the test is easy:

core.clj
1
2
(defn estimate-pi [points]
  (* 4 (point-ratio points)))

All that’s left is to create an Incanter chart. We’ll start by plotting the circle with Incanter. First, we need to write a function. Here’s a test for points on 1 = x2 + y2:

core_spec.clj
1
2
3
4
5
6
7
(describe "circle"
  (it "should equal 1 when x is 0."
    (should= 1.0 (circle 0)))
  (it "should equal 0 when x is 1."
      (should= 0.0 (circle 1)))
  (it "should equal 0.866 when x is 0.5"
      (should= "0.866" (format "%.3f" (circle 0.5)))))

To write the function, make sure you refer incanter.core/sqrt to your namespace:

core.clj
1
2
3
4
(ns montecarlopi.core
  (require [incanter.core   :refer [sqrt sq]]))

(defn circle [x] (sqrt (- 1.0 (sq x))))

At this point, writing tests gets a little hairy. The JFreeChart object on which Incanter plots are based doesn’t offer much in the way of public fields to test against. But we can at least check that the function returns a chart:

core_spec.clj
1
2
3
4
(describe "draw-circle"
  (it "should be a JFreeChart object."
      (should= "class org.jfree.chart.JFreeChart"
               (str (.getClass (draw-circle))))))

Refer incanter.charts/function-plot to plot the function:

core.clj
1
2
3
4
5
6
(ns montecarlopi.core
  (require [incanter.core   :refer [sqrt sq view]]
           [incanter.charts :refer [function-plot]]))

(defn draw-circle []
  (function-plot circle 0 1))

Running (view (draw-circle)) from the REPL should produce a chart like this:

The last step is to add the points and some annotations to the Incanter plot:

core.clj
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

(ns montecarlopi.core
  (require [incanter.core   :refer [sqrt sq view]]
           [incanter.charts :refer [function-plot
                                    add-points
                                    add-text
                                    set-x-label
                                    set-y-label
                                    set-y-range
                                    xy-plot]]))


(defn plot-points [chart points label]
  (let [xs (map first  points)
        ys (map second points)]
    (add-points chart xs ys :series-label label)))

(defn make-plot [n]
  (let [points (generate-random-points n)
        sorted (sort-points points)]
    (doto (draw-circle)
      (set-y-range -0.25 1)
      (plot-points (:inside  sorted) "inside")
      (plot-points (:outside sorted) "outside")
      (set-x-label "")
      (set-y-label "")
      (add-text 0.10 -0.10 (str "Total: "
                                (count points)))
      (add-text 0.10 -0.05 (str "Inside: "
                                (count (:inside sorted))))
      (add-text 0.10 -0.15 (str "Ratio: "
                                (point-ratio points)))
      (add-text 0.10 -0.20 (str "Pi: "
                                (format "%4f" (estimate-pi points))))
      (view :width 500 :height 600))))

Here’s a plot and estimate with 100 points:

With 10000:

And with 100000:

A gist with all the code from this post is available here.

“Experienced Lisp programmers divide up their programs differently. As well as top-down design, they follow a principle which could be called bottom-up design–changing the language to suit the problem. In Lisp, you don’t just write your program down toward the language, you also build the language up toward your program. As you’re writing a program you may think ‘I wish Lisp had such-and-such an operator.’ So you go and write it. Afterward you realize that using the new operator would simplify the design of another part of the program, and so on. Language and program evolve together. Like the border between two warring states, the boundary between language and program is drawn and redrawn, until eventually it comes to rest along the mountains and rivers, the natural frontiers of your problem. In the end your program will look as if the language had been designed for it. And when language and program fit one another well, you end up with code which is clear, small, and efficient.”

Paul Graham, from the introduction to On Lisp

Lisp programmers have a reputation (earned or otherwise) for considering their language of choice uniquely powerful, capable of extending the limits of the world with its special expressiveness. I find writing Lisp a joy, and for a long time I bought into the mythos. It still feels unique. But I’ve come to learn that good bottom-up design is possible in any language.

For the past two weeks, I’ve been working in Java, a language many programmers consider Blub incarnate. (You don’t have to take my word for it). But even without clever macros and first-class functions, following the rules of test-driven development and the principles of clean code feels a lot like the process of natural, iterative evolution Paul Graham describes.

A good language can encourage bottom-up, evolutionary design, and this is one of Lisp’s great strengths. But writing good tests—and writing them first—can go a step further and actually enforce it.

Writing tests first requires describing abstractions before they exist—writing the program you want to read from the very start. Using meaningful names transforms the language you have into the one you want. Revising after every passing test makes simplifying design second nature. And building up a program test by tiny test is an evolutionary process that generates clean, efficient code, whether you’re writing Common Lisp or COBOL.

Here’s a function that returns a given game board’s winner from my first crack at Tic Tac Toe in Clojure:

tictactoe.core/get-win
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(defn get-win
  "Takes a 3x3 game board. Returns a vector
  [winner start middle  end] of the winning player,
  and (row, col) grid coordinates of the three-in-a-row elements."
  [board]
    (let [wins           (check-for-wins board)
          winner         (if (= 1 (first (remove nil? wins))) 1 0)
          [row col diag] (unflatten wins)]
      (cond (not-empty-row? row)  [winner [(get-row-win row) 0]
                                          [(get-row-win row) 1]
                                          [(get-row-win row) 2]]
            (not-empty-row? col)  [winner [0 (get-row-win col)]
                                          [1 (get-row-win col)]
                                          [2 (get-row-win col)]]
            (not-empty-row? diag) (if (= 0 (get-row-win diag))
                                    [winner [0 0] [1 1] [2 2]]
                                    [winner [0 2] [1 1] [2 0]]))))

And here’s the equivalent I wrote in Java:

Board.winnerIs( )
1
2
3
4
5
6
  public int winnerIs() {
      if (hasWin()) {
          return getWinningRow().winner();
      }
      return _;
  }

Which excerpt reads more like a domain-specific language? Which would you rather read a year from now? I don’t doubt that I could clean up the Clojure into something just as simple and readable. But merely using an elegant language is no guarantee of elegant design.

I saved the view components of my Tic-Tac-Toe game for last. The functionality I need to implement for a command line application is pretty simple: the view should be able to print game boards and messages to the terminal, prompt for user input and pass it off to a controller, and not much more. But doing this the TDD way was harder than I expected. Here are the solutions I found to two testing problems.

Testing text output

Printing a board should be a one-liner, especially since I added a toString() method to my Board objects. System.out.println(Board) ought to work just fine. But wait—the tests come first.

Writing tests against printed terminal output isn’t tough, but it does require some setup. System.setOut() redirects Java’s default output to a byte stream of your choice. First, save the default System.out (It’s a PrintStream), so you can switch back to stdout later. Then, initialize a ByteArrayOutputStream:

1
2
3
4
5
6
7
    @RunWith(JUnit4.class)
    public class TerminalViewTest extends TicTacToeTest {

        private final PrintStream stdout = System.out;
        private final ByteArrayOutputStream output = new ByteArrayOutputStream();
        private TerminalView terminalview;

Add a setUp() method with the JUnit @Before annotation—the usual pattern to run some code before your tests. Pass System.setOut() a UTF-8 PrintStream object constructed with the byte array (You are using UTF-8 everywhere, right?):

1
2
3
4
5
        @Before
        public void setUp() throws UnsupportedEncodingException {
            terminalview = new TerminalView();
            System.setOut(new PrintStream(output, true, "UTF-8"));
        }

Then test against output. If the terminal output is as expected, it should match the result of the printed object’s toString() method:

1
2
3
4
5
6
        @Test
        public void terminalViewShouldPrintBoards() {
            terminalview.print(nowins);
            assertEquals(nowins.toString(), output.toString());

        }

Finally, make sure you clean up and redirect output back to stdout, so later print statements don’t go missing:

1
2
3
4
5
        @After
        public void cleanUp() {
            System.setOut(stdout);
        }
    }

Now you’re ready to write that one-liner. You can find the whole test class as a gist here.

Testing Unicode output with Infinitest

With my test in place, I wrote a quick print method and waited for the green flash. And waited. And waited… I’ve been using Infinitest, an excellent continuous testing plugin for IntelliJ and Eclipse, but it threw an assertion error, even though the tests passed when run directly from IntelliJ.

The problem? I wasn’t using Unicode everywhere! (Don’t say I didn’t warn you.) My boards print with Unicode box drawing characters that threw off Infinitest. IntelliJ runs in a UTF-8 environment by default, but character encoding must be passed as an option to the Infinitest JVM. Fortunately, this is another one-line solution: add a text file named infinitest.args in your project’s root directory with one argument per line. In this case, just:

-D file.encoding=UTF-8

With tests back in the green, I was ready to start testing user input—but that’s a yak shave for another day.

git

Like many young whippersnapper kids these days, I started using GitHub before I really understood git. The benefits of version control were obvious right away, but I stuck to the very simplest git commands for a long time, and many of them turned into habits.

This week, I’ve been working on making cleaner commits and writing better messages. Composing good commit messages is the best git habit of all, and I’ve found two simple substitutions that nudge me towards more thoughtful commits.

If you’re used to git add -u or git add -a, try:

git add -p

Instead of committing everything at once, this will walk you through the results of git diff, offering the option to stage each chunk of changed code:

@@ -34,5 +35,45 @@ public class GameTree {

+
+        public List<Node> getLeaves() {
+            List<Node> leaves = new ArrayList<Node>();
+
+            for (Node child : children) {
+                if (child.children.isEmpty()) {
+                    leaves.add(child);
+                } else {
+                    leaves.addAll(child.getLeaves());
+                }
+            }
+            return leaves;
+        }

Stage this hunk [y,n,q,a,d,/,K,g,e,?]?

(You can find the alphabet soup of staging options here). This is easier to read than a full diff, requires a careful review of each change in context, and makes composing small atomic commits much easier.

Then, if you’re used to git commit -m <message>, try:

git commit -F <filename>

Instead of pulling a commit message from the command line or opening an editor, this uses an existing file as your commit message. But the real trick here is to keep an editor open as you page through the previous command, and compose your commit message chunk by chunk.

There are many, many more git workflows, including ways to commit with reckless abandon and clean up later with git rebase, but these two simple changes have helped me make much better commits just by switching a few command line flags.

My apprenticeship at 8th Light began with the Three Laws of TDD:

  • Don’t write any production code unless it makes a failing test pass.
  • Write the most minimal test that is sufficient to fail.
  • Don’t write any more production code than necessary and sufficient to pass a failing test.

My first project obeying these new commandments is a rewrite of Tic-Tac-Toe in Java, a language completely new to me. Like all new projects, I began in a wilderness of error. IntelliJ was totally unfamiliar. Type declarations felt fussy and foreign. I just wanted to map over an array but had to settle for another for loop. And yet test by test, things slowly started to work.

After three days, I can’t claim to know Java. But I can claim that I know my Java works.

The clarity and certainty that come from good tests are feelings I’ve experienced before. Over the past few years, the practices and principles of Getting Things Done have transformed the way I work, think, and act. (Have I told you about our church? Would you like to read some inspirational literature?) Here is a formulation of the basic idea that might look familiar:

  • Don’t spend time and effort on anything but a project’s next incomplete action.
  • Choose the smallest next action sufficient to accomplish something useful.
  • Don’t spend more time or effort than necessary to finish an incomplete action.

To the uninitiated, writing tests and making lists might seem inflexible, robotic, and a little dorky. But TODO lists and test suites are really tools that externalize uncertainty. The vague sense that I really should do that thing becomes a clear action that will be in my inbox later. The nagging fear that a method might misbehave becomes proof that it does and will always do exactly what I expect.

I skated through school and college on cleverness and a keen balance of terror. And it worked! I wrote code that seemed to do what I told it long before I started writing tests. (Tests aren’t necessary for working code: they’re sufficient to prove that code works). But to claim that other ways work just fine is to miss the point. GTD and TDD are systems that radically, ruthlessly eliminate uncertainty—not because the rules are necessary in the strictest sense, but because they are sufficient to get things done, make things work, and do it all with a clear mind.