Hillel Wayne gave an example problem. The idea, as I understood it, was for people to solve it via property-based testing (PBT) or test-driven design and see what the differences were. In particular, I'm interested whether I missed any bugs that could have been found with PBT.

I don't know if that will happen, but I'm writing up my process so that it can be compared to Johannes Link's PBT process.

### Start

commit

As is my habit, I started with a trivial test and an implementation that returns only a constant value. The test and code are in the same file for now, just for convenience.

``````  def can_afford?(_budget, _bill) do
true
end

test "total budget always passes" do
budget = %{total_limit: 50}
bill = %{cost: 50}

assert can_afford?(budget, bill)
end``````

Note that I reflexively test a boundary where the bill is the same as the `total_limit`. Note also that I got the structure of the bill wrong: it should be a list of maps, not a map. I noticed that when writing the next test case.

### Boundaries

commit

The next test might as well try both sides of the boundary:

``````  test "total budget boundaries" do
bill = [%{cost: 50}]

assert can_afford?(budget(50), bill)
refute can_afford?(budget(49), bill)
end``````

`budget` is just a function that avoids a little creational clutter:

``````  def budget(total_limit) do
%{total_limit: total_limit}
end``````

The bill is a list, which means reaching into the toolbox of mapping functions ( `for` is just syntactic sugar for `Enum.map`, and `Enum.sum` is just a simple use of `Enum.reduce` ):

``````  def can_afford?(budget, bill) do
total_charge =
(for item <- bill, do: item.cost)
|> Enum.sum
total_charge <= budget.total_limit
end``````

As is my habit, I'm not going to bother testing different sizes of lists. It's hard to mess up use of mapping functions, so a singleton list suffices.

### Counts

commit

The irregularity of some bill items having counts and some not is already bugging me, so I'll tackle counts next. Here's a bill with both types of item:

``````    test "bill items can have counts" do
bill = [%{cost: 20}, %{cost: 20, count: 2}]
assert can_afford?(budget(60), bill)
refute can_afford?(budget(59), bill)
end``````

Note that I'm maintaining the boundary cases, as that's a fine way to see if the explicit count is taken into, um, account.

For no wildly compelling reason, I split `can_afford?` into two responsibilities. The new work is done in lines 3-4:

``````  def total_charge(bill) do
item_charges = for item <- bill do
count = Map.get(item, :count, 1)            ##### 3
item.cost * count                           ##### 4
end

Enum.sum(item_charges)
end

def can_afford?(budget, bill) do
total_charge(bill) <= budget.total_limit
end``````

### Normalization

commit

I started working on categories, but was immediately bugged by another irregularity: some items have categories and some don't. I decided to create consistency in a "normalization" step. For example, every item will have a `categories` key, whose value may be an empty list.

Instead of forcing every item to default to a `count` of `1`, I decided to split any item with a count of `n` into `n` items, for two reasons:

1. That way, there's only one place in all the code that will ever have to worry about counts.
2. Without thinking ahead too much, I suspected that would make later code easier. The Hillel doc's "should be as permissive as possible" requirement must mean that sometimes, given an item with count `2`, the item's total `cost` will have to be divided up between two categories. That seems like it will come for free if I have two items.

I started by normalizing a single item into `n` items, removing the `count` field. You'll note that I began to indulge in tabular tests, the way I often do. (It looks a bit prettier in the original source because the lines can be longer.)‌

``````    test "normalizing a bill" do
expect = fn bill, expected ->
assert normalize(bill) == expected
end

%{cost: "c"}
|> expect.([%{cost: "c", categories: []}])
%{cost: "c", categories: ["a"]}
|> expect.([%{cost: "c", categories: ["a"]}])
%{cost: "c", categories: ["a", "b"], count: 2}
|> expect.([%{cost: "c", categories: ["a", "b"]},
%{cost: "c", categories: ["a", "b"]}])
end
``````

The solution is, again, a straightforward mapping:‌

``````  def normalize(bill) when is_map(bill) do
cost = bill.cost
categories = Map.get(bill, :categories, [])
count = Map.get(bill, :count, 1)

Enum.map(1..count, fn _ ->
%{cost: cost, categories: categories}
end)
end``````

I'm using a typical idiom, converting an item into a list of one or more items, with the idea of "flatmapping" (concatenating) them together:

``````  def normalize(bills) when is_list(bills) do
Enum.flat_map(bills, &normalize/1)
end``````

Notice that I'm doing (in effect) "polymorphic type dispatch." If the thing being normalized is a map, I assume it's an item. If it's a list, I assume it's a list of items. I won't be surprised if I turn the idea of "item" from a convention into a real datatype, but I'm in no hurry.

That second `normalize` variant was driven by this test:

``````    test "normalizing bills" do
actual = normalize([%{cost: 20, count: 2}])
assert actual == [%{cost: 20, categories: []},
%{cost: 20, categories: []}]
end``````

Once again, I'm using a single element list because I trust that Elixir has implemented `flat_map` correctly.

### Cleanup

I noticed sloppiness and bad naming. For example, `normalize` used the names `bill` and `bills` when it should have used `item` and `bill`.

I decided I preferred the idea of `remaining_total_after_bill` to `total_charge`, so I changed the code to implement that. (I think I thought the first abstraction would fit later code better. That turns out to be wrong, so I should have waited until I knew the first abstraction was better.)

I also decided to put the test and product code into different files, since I was starting to have to scroll around to find things.

### One category

commit

For the moment, I'll work only on debiting of category limits, without worrying about the enclosing structure with its `total_limit` and `categories` fields. So the first step is to show that an existing category limit can be debited. That is, given a category `cat` with limit `10`, debiting `8` from it will leave the limit at `2`:‌

``````  describe "debiting of categories" do
test "debit of existing category" do
actual = remaining_category_total_after_bill(%{cat: 10}, :cat, 8)
assert actual == %{cat: 2}
end
end``````

I chose that function signature because it mimics existing Elixir Kernel functions. So the implementation is a standard `Map.update`:

``````  def remaining_category_total_after_bill(
categories, category, bill_amount) do

Map.update!(categories, category, &(&1 - bill_amount))
end
``````

Notice that I confused "bill" and "item" again in creating `remaining_category_total_after_bill`. It really updates one category limit with the cost of one bill item. Argh!

I frequently get names wrong, which is part of the reason I have so many cleanup breaks while coding. That's when I most often notice the bad names. While coding, my brain seems to treat them as arbitrary tokens.

### A missing category

commit

The requirements are clear that a category named in an item might not exist in what I'm starting to think of as the "limit holder" (the map from category names to integers). To make this case visually different from the previous one, I use tabular tests again:

``````  test "debiting of categories" do
expect = fn [categories, category, amount], expected ->
actual = remaining_category_total_after_bill(categories, category, amount)
assert actual == expected
end

[%{category: 10}, :category, 8] |> expect.(%{category:  2})
[%{category: 10}, :MISMATCH, 8] |> expect.(%{category: 10})
end
``````

‌That requires behaving differently if the limit holder doesn't contain the category. Two test cases, two branches:

``````  def remaining_category_total_after_bill(
categories, category, bill_amount) do

case Map.has_key?(categories, category) do
true ->
Map.update!(categories, category, &(&1 - bill_amount))
false ->
categories
end
end``````

### Cleanup, and `Item`

Time for cleanup. One way to avoid my constant confusion between bills and bill items is to make "item" into a proper structure with its own constructor:

``````defmodule HillelBudget.Item do
defstruct [:cost, :categories]

def item(cost, categories),
do: %__MODULE__{cost: cost, categories: categories}``````

I also moved all the normalization code into `Item`, because the module is all about items and lists of items.

The constructor makes creation of items in tests a bit less noisy:

``````                      |> expect.([item("c", ["a", "b"]),
item("c", ["a", "b"])])``````

... and I think using the real word `item` instead of a literal map, and a list of `HillelBudget.Items` instead of a literal list of literal maps will avoid some of my bad naming.

Were you to say I should have created `Item` earlier, I wouldn't argue.

### Normalization, again

commit

There's a certain amount of messiness with the previous test. I'm using `"c"` there to represent the idea of "some irrelevant cost". (I used `"c"` instead of `"cost"` to avoid line wrapping – a bad decision.) In my Midje test framework for Clojure, I gave such "metaconstants" their own notation, but I can't do that in Elixir. What I should do is something like this:

``      c = 535 # This represents some random cost.``

... and use `c` from then on. You could certainly argue that I ought to make `c` a variable that gets assigned different values by a PBT generator. Except I think that signals the wrong thing: that the value of `c` might matter to the code, whereas I want to signal that it does not.

Nevertheless, that made me realize something. I was following Hillel's examples and using strings for category names, but my previous test used atoms (symbols). In Elixir, atoms are more convenient than strings for things like keys (as well as being easier to type on my keyboard and standing out better with my code colorizer), so I decided to convert categories in the normalization step. For example:

``````     %{cost: c, categories: ["a"]}
^^^
|> expect.([item(c, [:a     ])])
^^``````

(I know such a conversion is not a good idea if you're coding a web server that handles floods of untrustworthy user data. But I'm not.)

### Making "limit holder" more concrete

commit

I'd already been thinking of the map from category names to category limits as a "limit holder." That seems an overly-general name: why not use the word "category"? Because I'd noticed something about a budget:

``````%{total_limit: 5,
category_limits: %{
category_a: 1,
category_b: 3
}}
``````

The whole looks a bit like a reflection of the part. That is, the whole budget can be seen as a limit holder for the category `total_limit`, if you sort of squint and ignore `category_limits`.

So I used the more general name in a vague hope that maybe I could somehow someday cleverly use the same code for the total limit as for the category limits.

So all the category work from now on will happen in module `HillelBudget.LimitHolder`. I even added documentation!

``````@moduledoc """
A limit-holder is a map of key-value pairs where the pairs are
integers that represent a limit on spending. The limits may be
negative if spending has exceeded the limit.
"""
``````

Note, however, that I didn't make a `LimitHolder` named structure. The `LimitHolder` module just groups conventions for dealing with maps of a certain "shape."

I also renamed  `remaining_category_total_after_bill` to  `LimitHolder.decrement`.

### In which I disappoint Bertrand Meyer

commit

Because of the requirement that `can_afford?` must answer "yes" if there's any possible assignment of costs to categories that keeps them all in budget, we have a classic search problem. My original intention was to just brute-force generate all possible assignments and see if any worked.

But I decided to be at least a little ambitious. If the code discovered that item five of ten in a bill made success impossible down one branch of the search tree, it shouldn't generate the remaining items since they were all guaranteed impossible.

In practical terms, that means: now that I can `decrement` a limit holder with data from one item, I need to figure out how to integrate decrementing with the processing of a whole bill.

I did a little fiddling around ("spiking"). It seemed more convenient if `decrement(limit_holder, category, amount)` either returns a decremented limit holder or the magic value `:overdrawn` to indicate there's no point in going on. (Essentially, this is like the `Maybe` or `Either` structure in languages that eschew nulls.)

I'd read Bertrand Meyer's Object-Oriented Software Construction (1988) when it came out, and he made a strong case that you should have separate methods for "do this!" and "what happened?" Even though I'm programming in a functional language, I felt guilty for violating his guideline, but doing that seemed right. Or at least not wrong. So:

``````  # Forgive me, Bertrand Meyer, but combining the update and
# the query seems to make handling missing keys easier.``````

That change is driven by a tweaked test for `decrement`:

``````  test "debiting of values" do
expect = fn [holder, key, amount], expected ->
assert LimitHolder.decrement(holder, key, amount) == expected
end

[%{key: 10}, :key,      8] |> expect.(%{key:  2})
[%{key: 10}, :MISMATCH, 8] |> expect.(%{key: 10})

# Overdrawing
[%{key: 10}, :key,      10] |> expect.(%{key: 0})
[%{key: 10}, :key,      11] |> expect.(:overdrawn)
^^^^^^^^^^
end

``````

And the code grows a new case:

``````  def decrement(holder, key, amount) do
case Map.has_key?(holder, key) do
true ->
new_balance = Map.get(holder, key) - amount
if new_balance < 0 do                   #<<<<#
:overdrawn                            #<<<<#
else
Map.put(holder, key, new_balance)
end
false ->
holder
end
end
``````

(I'm not wild about nested cases, but I'll tolerate this one.)

### Bad programmer! Bad!

Side note: As I write this blog post, I've just discovered that the last sentence of my initial comment for module `LimitHolder` was made false by the commit I just described:

``````  @moduledoc """
A limit-holder is a map of key-value pairs where the pairs are
integers that represent a limit on spending. The limits may be
negative if spending has exceeded the limit.
"""
``````

A `LimitHolder` may now never be decremented to a negative value. (Note: I asked Hillel if they could ever start with a negative value, and he said I should assume not.)

I'm not as hard-core as some people about shunning comments, but this is an example of their danger: saying nothing is often better than telling a truth that becomes a lie. Yes, I should be scrupulous enough to review comments after making a change. But I'm bad at both writing and reading comments when I'm in "coding mode," for the same reason, I suppose, that I'm bad at choosing names. I need a specific cleanup phase. (But that wasn't enough, because the wrong comment remains in the final commit. I probably never looked at the paragraph again. I need a cleanup checklist.)

### Breadth-first

commit

Now I have to handle constructing the search tree. The easy cases are where items have zero or one categories. The test is a little awkward to read because it takes a single bill (`original`) and applies an item to it.

``````    test "the easy category cases: 0 or 1" do
original = [%{category: 10}]

expect = fn item, expected ->
assert LimitHolder.apply_item(original, item) == expected
end

# cases that do not apply
item(10, [     ]) |> expect.(original)
item(10, [:miss]) |> expect.(original)

# one category, and it matches
item(10, [:category]) |> expect.([%{category: 0}])
item(11, [:category]) |> expect.([              ])  # overdrawn
end
``````

I can't show the code that passes the test because I didn't commit it. Instead I wrote tests for the cases where multiple properties caused a split in the search tree:

``````test "the cases that require splitting" do
original = [%{cat_a: 10, cat_b: 10}]

expect = fn item, expected ->
assert LimitHolder.apply_item(original, item) == expected
end

# the need to split
item(10, [:cat_a])         |> expect.([%{cat_a:  0, cat_b: 10}])
item(10, [:cat_b])         |> expect.([%{cat_a: 10, cat_b:  0}])
item(10, [:cat_a, :cat_b]) |> expect.([%{cat_a:  0, cat_b: 10},
%{cat_a: 10, cat_b:  0}])

# overdrawn cases are pruned
item(11, [:cat_a])         |> expect.([])
item(11, [:cat_b])         |> expect.([])
item(11, [:cat_a, :cat_b]) |> expect.([])
``````

Now we see why `:overdrawn` is useful. We can prune limit holders from the growing tree without having to look inside them to see which, if any, categories are overdrawn:

``````  def apply_item(holders, item) do
case item.categories do
[] ->                           ###### 3
holders                       ###### 4
categories ->
Enum.flat_map(categories, fn category ->
for holder <- holders, do: decrement(holder, category, item.cost)
end)
|> Enum.reject(&(&1 == :overdrawn))    ####### 9
end
end
``````

As I look at this now, I don't see why the first case (lines 3-4) is there. I suspect it's a holdover from the simpler code that handled the zero- and one-category cases.

Note again that line 9 is justified by the desire to avoid wasted work. At this point, I decided to keep putting efficiency hacks in. That's in order to give PBT more to chew on, to make it more likely to find bugs. (Ordinarily, I'd want evidence that they were needed before adding them.)

### Cleanup

commit

I decided to do some cleanup. But instead of discovering the unnecessary code, I just repeated it in a different form:

``````  def apply_item(holders, %Item{categories: []}), do: holders

def apply_item(holders, item) do ...
``````

Sigh.

### Plugging categories into `can_afford?`

commit

`apply_item` is still only called by tests. Next task is to start using it for real. First step is to sketch a new version of `can_afford?`

``````  def can_afford?(budget, bill) do
items = Item.normalize(bill)

total_ok = the_total_budget_is_ok(budget.total_limit, items)
categories_ok = true    # <<<<<<<<<<<<

total_ok && categories_ok
end
``````

Notice that I've changed `remaining_budget_after_bill` to `total_budget_is_ok`, my third version of the same basic concept.

This required no changes to the test because the highlighted line is just a placeholder.

### Implementing `surviving_holders`

commit

`categories_ok` will be true whenever applying each bill item to the categories' limit holders leaves some "alive." So I wanted a function, `surviving_holders`, that takes some `LimitHolders` and a bill, does the splitting and pruning called for by the `Items` in a bill, returning a list of limit holders that represents all possible allocations of costs to categories (all those that don't violate any category limits).

That sounds complicated, but the hard work is already done by `apply_item`. I knew the function would look something like this:

``````  def surviving_holders(holders, items) do
Enum.reduce(items, holders,
fn item, acc -> apply_item(acc, item) end)
end``````

... and, indeed, that's exactly what it looks like.

Despite knowing the code would be simple, I decided to write a somewhat elaborate test to show the splitting-and-pruning logic. It has three parts. It starts with the boring case of a single-category item where the category (here, `cat_a` ) matches:

``````  describe "surviving holders" do
test "surviving holders" do
original = [%{cat_a: 10, cat_b: 10}]
^^^^^^^^^

one = item(5, [:cat_a])
^
one_step = LimitHolder.surviving_holders(original, [one])
^^^^^
assert one_step == [%{cat_a: 5, cat_b: 10}]
^^^^^^^^``````

My next step added more lines to that test (so that I could reuse the item named `one`), and demonstrated splitting. Here, a single `LimitHolder` spawns two:

``````      two = item(5, [:cat_a, :cat_b])
two_step = LimitHolder.surviving_holders(original, [one, two])

assert_good_enough(two_step,
in_any_order([
%{cat_a: 0, cat_b: 10}, # 5 subtracted twice from A
^^^^^^^^
%{cat_a: 5, cat_b: 5}   # 5 taken once from A and B
^^^^^^^^
]))
``````

(`assert_good_enough` and `in_any_order` are from my `flow_assertions` package.)

Finally, I wanted to demonstrate how holders might be pruned (that is, the handling of `:overdrawn`). Notice below that the result that would come from subtracting `5` from `cat_a` three times (yielding `-5`) is omitted:

``````      three = item(5, [:cat_a, :cat_b])
three_step = LimitHolder.surviving_holders(original, [one, two, three])

assert_good_enough(three_step,
in_any_order([
%{cat_a: 0, cat_b: 5},
%{cat_a: 0, cat_b: 5},
%{cat_a: 5, cat_b: 0},
]))
end
end``````

It's reasonable to argue I should have written just enough of a test to force the simple implementation of `surviving_holders`. But this is really the only test that shows how the whole thing works. As such, it might serve as documentation. It also served as some reassurance that I really understood how everything fits together. (That is, it's what Bill Wake calls confirming tests.)

### Optimization, again

commit

I made a couple of speculative optimizations to `surviving_holders` to see if I might introduce bugs PBT would find. See the commit for details. They just slightly complicate the code.

### Connecting `surviving_holders` and `can_afford?`

commit

What remained was to force `can_afford?` to actually use `surviving_holders`. Any budget that violated a category limit without violating the `total_limit` would suffice. I used this:

``````   test "total budget boundaries" do
budget = budget(100, a: 5)
bill = [%{cost: 20, categories: ["a"]}]

refute can_afford?(budget, bill)
end``````

That led to replacing `categories_ok = true` with some real code:

``````  def can_afford?(budget, items) do
items = Item.normalize(items)

total_ok =
the_total_budget_is_ok(budget.total_limit, items)
categories_ok =
all_category_budgets_are_ok(budget.category_limits, items)
#^^^new^^^new^^^new^^^new^^^new^^^new^^^new^^^new^^^new^^^

total_ok && categories_ok
end

def all_category_budgets_are_ok(category_limits, items) do
not Enum.empty?(LimitHolder.surviving_holders([category_limits], items))
end``````

At this point, I consider myself done. I have code I believe works.

### One last performance hack

commit

However, the existing code checks all the categories even if the `total_limit` is exceeded, so that seemed worth fixing:

``````  def can_afford?(budget, items) do
items = Item.normalize(items)

if the_total_budget_is_ok?(budget.total_limit, items),
do: all_category_budgets_are_ok?(budget.category_limits, items),
else: false
end``````

Since there's no change to behavior, there's no need for new tests.

### Tests as API documentation

commit

As a final flourish, I decided to add "doctests": runnable tests within the `can_afford?`'s documentation. I used the examples given in Wayne's specification. I admit that I wanted to see them work to get that final bit of confidence in my code.

Even if you rationally know you're justified in believing your process solved the problem, it's always nice to check and see.

As I typed them up, I wanted to expand on the examples a bit. I added a new case, wrote something like "... and this bill will cause `can_afford?` to be false..." – only to discover it actually returned true.

Rather than walk through the calculations myself, I decided to write an `affordability` function that would call the same functions as `can_afford?` but return rich results instead of funneling all the possibilities into `true` or `false`.

So `affordability` would return either this:

``{false, "Total limit is exceeded by #{-remaining}."}``

... or this:

``{false, "There is no way to avoid *some* category's limit being exceeded."}``

... or this:

``````{true, "Total budget remaining: #{remaining}",
"Possible category budget allocations follow",
surviving_holders}``````

... so it would be more clear why what I expected was what not what I got.

This was just a matter of replacing code that had returned `true` or `false` with code that returned the above annotated values.

Then I wrote `can_afford?` in terms of `affordability`:

``````  def can_afford?(budget, items) do
case affordability(budget, items) do
{false, _} -> false
{true, _, _, _} -> true
end
end``````

I want to call this out because boolean functions are dangerous. They funnel down a vast number of possibilities into two values. It's very easy for an incorrect intermediate result to produce the correct final result just because of (bad) luck.

For that reason, it's better to test interior functions like `affordability` and separately test the reduction of detail to `true` or `false` than it is to test only through the narrow boolean interface.

I converted my tests to do that. If you look at the complete test suite, almost none of it is "about" `can_afford?`.

### The final version

I did a bit of cleanup and flailing around with documentation. None of it is worth describing. But the final result is here.