Thursday 30 July 2009

TDD: The door analogy

Recently, I was explaining what test driven development was to my wife and used a description involving the creation of a door, and I realised this may be a very good way to explain what TDD is, how it's meant to function and why it produces superior results. I've thought a bit more about the analogy and fleshed it out some, so here goes:

Peter has asked you to create a door so you go away and start writing some tests for what the door should do based on his statement of what he wants. You start initially with:
1) The door should have a handle
2) If you turn the handle and push then the door should open

So you go away and create a door that fulfills these tests. You present this to Peter, he opens the door and it does this by falling over. So you go back to the tests and add some tests you missed:
3) When the door is open, you should be able to pull on the handle and it will close
4) The door should stay upright when both open and closed

You then create this door (realising with these extra tests that you needed hinges) and present it again. Peter is happier with this new door, but then notices that if he pushes the door without turning the handle it still opens. This is another missed test so you add it to your tests:
5) If you push on the door without turning the handle the door should stay closed

You then create a door, adding a latch to the door that retracts when you turn the handle and test again. When running the tests you notice that half the time, test's 3 and 5 are failing and you realise that it's because of the construction of the latch. If the door opens in one direction then the latch won't retract automatically when pulling the door closed. You go back to Peter and say you need to clarify what he wants to happen and present him with the following alternatives:
1) The door can only open in one direction so you need to push the door from one side and pull the door from the other
or
2) In order to close the door you must turn the handle in order to manually retract the latch and close the door fully

Peter considers this and says he wants option one. This then causes a rewrite of the test cases to the following:
1) The door should have a handle
2) The door should only open in one direction
3) To open the door twist the handle and either push or pull. Only one of these should work depending on which side of the door you are on
4) When the door is open, you can close the door by performing the opposite action to the one used to open it
5) The door should stay upright when both open and closed
6) If you attempt to open the door without turning the handle the door should stay closed

You rewrite the tests and then run the previously constructed door through these tests to see where the problems are. This time, tests 2, 3 and 6 fail. Looking at the first of these you see that the door opens in both directions, which is now a violation of the tests so you add a ridge to the door frame that stops it from opening in one direction and re-run the tests. This time no tests fail, you present your door to Peter one last time and he is happy with it and installs the door all over his house.

So with this process, you have several iterations and each one improves the door. More importantly, each iteration adds more tests in which show that the door is improved. Now, most people would think 'but a door is obvious, the final version is how I'd have created it initially' but consider... what if Peter had wanted a door that opened in both directions and required you to turn the handle to shut it? You would have created a door that didn't do that and wouldn't have identified the point where it was required. You would have just given Peter his door and he would have gone away less satisfied.

Also, this is a high level example. Consider what you'd do if you didn't know what a door was? You'd look at some doors and create something that looked like it, with no way of knowing it was correct or not. It may work initially, but after some improvements it suddenly starts falling over. Now you are stuck in the position of having no clue about why it's falling over and what it was meant to do initially. So you go 'Right, it shouldn't fall over' and just prop it up so it won't fall over... but then the door doesn't open and you have an annoyed customer. If you had your tests there you would be able to point to what it was meant to do (e.g. test 5, 'stay upright'), prop it up and then when retesting spot that other tests are failing. So you go away and look at the problem some more, coming up with the solution that you need a third hinge on the door to reinforce it and stop it falling over after some use. You also add to your test cases:
test 7 - The door should be able to be opened and shut multiple times without falling over
and present the new door to your customer who is now delighted that you've solved the problem properly.

The analogy is probably a bit strained now, but the principle still holds... the tests are there for more than just 'testing' the system. They are there as a verification, they are there as a safety net, they are there as your specifications and they are there as your guide in an area you may not know much about. If you have something that isn't working correctly (due to a lack of understanding for example), you should identify which test(s) are testing things incorrectly and then modify them to test for correct behaviour. You then re-run these tests *without* changing any code (even if you 'know' what's wrong) so that you verify first that the new tests are failing. If you modify your tests and they still pass then your tests are still wrong (as the program has incorrect behaviour still), but if you modify your tests and your code and your new tests pass you don't know if it was because your new code works perfectly or the modified tests are incorrect.

Now, I know this is all standard to people who are avid TDD people, but I'm still getting up to speed on this methodology and the reasons behind why I've avoided it are:
1) Difficulty of testing - Big things are hard to test, but the little things are trivial and seem like they don't need testing... test them anyway, you never know if you will find a bug there and by testing the little things you are then able to break your big things down into smaller tests that *you've already written* and only test the small bit of new behaviour.
2) Benefits - Until you really think about the process and break down where the tests come in, TDD seems like a silly reversal. Why would it have advantages? Of course, with the above description the advantages are that you more quickly identify problems (in the third iteration, you immediately spot the test failures when adding the latch and ask for what the customer wants done when you realise it's due to mutually exclusive test cases). If you didn't have tests, or your tests were incidental things written after you finished creating your door how you thought it should work then you lose this benefit. There are other benefits that I'm getting clearer on, but they are left as an exercise for the reader ;)
3) 'But some things can't be tested' - This is a common concern, and it is false. Some people see the UI as untestable, but there are a lot of tools nowadays that allow you to test the UI in it's entirety. And before you get to testing the full UI you have a lot of components building it up. These CAN be tested. You can test that they change state as expected when called with fake input. You can get it to draw to a bitmap and check that against a pixel perfect bitmap that is how it *should* look. So you can verify every step of the way and build up bigger tests from well tested components, making this exactly the same as reason 1.

Those are the big reasons for me, and they are very much false reasons. I'm starting to get on board the TDD bandwagon and in the future I intend to have much better tests and try to write them before writing my code :) Of course, if I sometimes fail then it's not the end of the world, but I'll know what to blame when things start going wrong.

Thursday 9 July 2009

Reflections on Python, testing, Continuous Integration and other things

It's been a busy few weeks, both in work and out (for out of work, see Shared Illusions). In work, and of a technical nature, I've been focusing on testing and Continuous Integration and I feel it's time to look back and see what has come out of it.

Firstly, what I've been using for this. The list is:
  • Python
  • Py.test - My first test package. Very nice and easy to use but abandoned for reasons explained later
  • Nose - A nice package available for Python for running tests with a variety of options including profiling and code coverage (using coverage.py)
  • Hudson - A well featured build server package with a plugin architecture and several useful Python related plugins
It should be fairly obvious from my previous posts that I'm using mainly Python for work code at the moment. I've gone on at length about various Python packages and my 'fun' with transforming a complex XML structure into an equally complex and completely disjoint object structure :)

So, onto Py.test. This was my first pick for Python unit testing and I'd written a fair few tests with it. I've now moved away from it, and this is mainly because it was a bit too simple for what I needed for CI. Basically, Py.test is an absolute gem for quickly writing and running unit tests. Simply prefix tests with 'test_' and run py.test in the project's root directory and your tests will be run and results printed out. The approach is great - it's simple, it's transparent and it's fairly stress free. It just doesn't provide some key features from Nose, namely the ability to output the test report in an XUnit XML file and code coverage integration. For a lot of people, these aren't necesarry, but to get the most out of a CI build server, they are very much desirable as then the build server can generate nice graphics over time for test failures and give much more accurate reports on build status. Therefore py.test was abandoned in favour of a more complete tool.

That tool was Nose. This is a more featured test runner for Python and it is pretty easy to transfer from using Py.test to using Nose (and also from using PyUnit to using Nose, if that's what you prefer). For Py.test to nose, you don't need to modify your tests at all, unless you have function/method level fixtures. If you have these then you need to decorate the methods that use them with @setupmethod(setup_func, teardown_func)... hardly a difficult change. Nose will then do the same thing as Py.test and discover all your test functions following a configurable nameing convention (which defaults to 'test_'). For this minimal change, you are now able to run your unit tests with a wider range of report generators (including the aforementiond xunit), profile your code and run code-coverage tools all with Nose plugins and simple command line switches. Well worth it if Py.test just doesn't quite provide the test reports you need.

Finally, I needed an automated build server. While python code isn't built/compiled, the other features of a build server were very much desirable, such as its source code repository monitoring, automated running of unit tests and a nice web interface to view test results and previous builds through. I've previously encountered CruiseControl and was less than impressed with it (not particularly easy to configure, lacking in some key areas, and the interface is a bit clunky). I went looking and found Hudson. This is an 'evil Java app' that runs through tomcat, but is pretty damn good at what it does :) It provides a much cleaner web interface than Cruise Control, it comes with the ability to create new projects through the web interface (I previously had to create my own project creation for Cruise Control) and has the ability to create build scripts within the web interface using Ant targets, Maven targets, shell commands and allows new options to be added with plugins. For all of that, I can stand the minor contamination of java testing my Python code, especially as I don't have to write any java code ;)

So, what was the purpose of all of these tools? Basically, they were to get my projects up to speed with automated unit testing. This is an area I've never managed to get fully on board with (it seemed a waste at uni and I didn't manage to get the build server set up for unit tests at EMCC before they went under) but it's benefits are starting to show themselves. While I don't think the current projects I'm working with will get full unit test suites (not enough time to do so, as they are primarily research rather than production projects), I now have the necesarry legwork out the way to provide much more comprehensive testing in the future and for the testing to be run easily and hassle free every time I commit a change.

Refs:
Hudson: https://hudson.dev.java.net/
Py.test: http://codespeak.net/py/dist/test/test.html
Nose: http://code.google.com/p/python-nose/
Python: http://www.python.org/