25 Dec 2009

Roy Osherove's TDD Kata: My first attempt

I recently came across Roy Osherove’s commentary on Corey Haines' attempt at Roy’s TDD Kata so I thought I’d try it out in C#.

Andrew Woodward has recorded his version of the kata where he avoids using the mouse for the whole exercise so I tried to avoid using the mouse as well and it was surprisingly difficult!

I’ve only done the first part of the exercise so far which is as follows:

Create a simple String calculator with a method int Add(string numbers)
1. The method can take 0, 1 or 2 numbers, and will return their sum (for an empty string it will return 0) for example“” or “1” or “1,2”
2. Start with the simplest test case of an empty string and move to 1 and two numbers
3. Remember to solve things as simply as possible so that you force yourself to write tests you did not think about
4. Remember to refactor after each passing test
Allow the Add method to handle an unknown amount of numbers
Allow the Add method to handle new lines between numbers (instead of commas).
1. the following input is ok: “1\n2,3” (will equal 6)
2. the following input is NOT ok: “1,\n”
3. Make sure you only test for correct inputs. there is no need to test for invalid inputs for these katas
Allow the Add method to handle a different delimiter:
1. to change a delimiter, the beginning of the string will contain a separate line that looks like this: “//[delimiter]\n[numbers…]” for example “//;\n1;2” should return three where the default delimiter is ‘;’ .
2. the first line is optional. all existing scenarios should still be supported
Calling Add with a negative number will throw an exception “negatives not allowed” - and the negative that was passed.if there are multiple negatives, show all of them in the

Mouseless coding

I know a lot of the Resharper shortcuts but I found myself using the mouse mostly to switch to the solution explorer and run the tests.

These are some of the shortcuts that have become more obvious to me from trying not to use the mouse:

I’m using a Mac and VMWare so I followed the instructions on Chris Chew’s blog to setup the key binding for 'Alt-Insert'. I also setup a key binding for 'Ctrl-~' to map to 'Menu' to allow me to right click on the solution explorer menu to create my unit tests project, to add references and so on. I found that I needed to use VMWare 2.0 to get those key bindings setup - I couldn’t work out how to do it with the earlier versions.
I found that I had to use 'Ctrl-Tab' to get to the various menus such as Solution Explorer and the Unit Test Runner. 'Ctrl-E' also became useful for switching between the different code files.

Simplest thing possible

The first run through of the exercise I made use of a guard block for the empty string case and then went straight to 'String.Split' to get each of the numbers and then add them together.

It annoyed me that there had to be a special case for the empty string so I changed my solution to make use of a regular expression instead:

CSHARP Regex.Matches(numbers, "\\d").Cast<Match>().Select(x => int.Parse(x.Value)).Aggregate(0, (acc, num) => acc + num);

That works for nearly all of the cases provided but it’s not incremental at all and it doesn’t even care if there are delimeters between each of the numbers or not, it just gets the numbers!

It eventually came unstuck when trying to work out if there were negative numbers or not. I considered trying to work out how to do that with a regular expression but it did feel as if I’d totally missed the point of the exercise:

Remember to solve things as simply as possible so that you force yourself to write tests you did not think about

I decided to watch Corey’s video to see how he’d achieved this and I realised he was doing much smaller steps than me.

I started again following his lead and found it interesting that I wasn’t naturally seeing the smallest step but more often than not the more general solution to a problem.

For example the first part of the problem is to add together two numbers separated by a comma.

Given an input of "1,2" we should get a result of 3.

I really wanted to write this code to do that:

CSHARP if(number == "") return 0;
return number.Split(',').Aggregate(0, (acc, num) => acc + int.Parse(num));

But a simpler version would be this (assuming that we’ve already written the code for handling a single number):

CSHARP if (number == "") return 0;
if (number.Length == 1) return int.Parse(number);
return int.Parse(number.SubString(0,1)) + int.Parse(number.SubString(2, 1));

After writing a few more examples we do eventually end up at something closer to that first solution.

Describing the relationships in code

I’m normally a fan of doing simple incremental steps but for me the first solution expresses the intent of our solution much more than the second one does and the step from using 'SubString' to using 'Split' doesn’t seem that incremental to me. It’s a bit of a leap.

This exercise reminds me a bit of a post by Reg Braithwaite where he talks about programming golf. In this post he makes the following statement:

The goal is readable code that expresses the underlying relationships.

In the second version of this we’re describing the relationship very specifically and then we’ll generalise that relationship later when we have an example which forces us to do that. I think that’s a good thing that the incremental approach encourages.

Programming in the large/medium/small

In this exercise I found that the biggest benefit of only coding what you needed was that the code was easier to change when a slightly different requirement was added. If we’ve already generalised our solution then it can be quite difficult to add that new requirement.

I recently read a post by Matt Podwysocki where he talks about three different types of programming:

Programming in the large: a high level that affects as well as crosscuts multiple classes and functions
Programming in the medium: a single API or group of related APIs in such things as classes, interfaces, modules
Programming in the small: individual function/method bodies

From my experience generalising code prematurely hurts us the most when we’re programming in the large/medium and it’s really difficult to recover once we’ve done that.

I’m not so sure where the line is when programming in the small. I feel like generalising code inside small functions is not such a bad thing although based on this experience perhaps that’s me just trying to justify my currently favoured approach!

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.