Dear Youse Bastards,
I recently grumbled about letting myself get stuck in a unit-testing quandary.
You responded in the comments with a calvalcade of recommended best-practices, which I proceeded to both agree with in principle in many situations, but also stridently defend that I shouldn’t use them in my situation – the payback wasn’t warranted.
I then proceeded to get back to work on changing a mid-level unit. Suddenly, the Refactoring Fairy came and whispered in my ear a suggested way of abstracting away one part of the ugly, third-party I/O layer.
It meant abandoning some of the progress I had made, but it was The Right Thing™, so I added a new class. Of course, I had your silly ideas still ringing in my head, so the new layer was written to support Dependency Injection (DI) to replace the layers below, and I also subclassed a Mock version so I could test its clients. When I unit-tested the new layer, I was careful not to over-test.
Then I went back to the mid-level unit and refactored to use the new layer. The code became 43.0% more beautiful.
While I was updating the mid-level unit’s unit-tests, I could use the new mock objects, and it went very smoothly.
Now, I could work with the exactly same API from the high-level unit to the mid-level unit, but I could now simplify the handles to I/O objects, so I went ahead and did that… and while I did, I added a few more interface features to allow mock objects to be more easily used.
While changing the high-level unit – the one that starts “# TO DO: Refactor into smaller parts, that can be tested faster.” – to use the new mid-level API, I got the Refactoring Fairy to keep look-out to make sure no Project Manager was watching, and I hacked out the complete chunk that dealt with the lower-level unit, wrapped it in a pretty – and mockable – API, and tested it separately.
Finally, I stitched together the wounds of the top-level object, and looked at its unit-tests… the horrible ones that had hard-dependencies on the lower-level objects and were the cause of the original complaint. With your fancy-pants concepts at top of mind, I could now see several problems with them:
- Of course, they had horrible dependency problems, which is why I originally complained.
- They were slow (dealing with timeouts, and the like, meant they were filled with sleep statements), which triggered the To Do comment.
- They were over-testing – spending effort on checking that the sub-layers were working correctly rather than focussing on the current unit being examined.
- Finally, they also tested items that were no longer part of the unit, because they had been cut away.
I substantially rewrote these tests, using the mock layers I now had waiting. The results were much shorter and much easier to understand. The number of test-cases requiring sleep statements was substantially reduced, and I also made the timeouts overrideable, so I could force timeouts by sleeping for 50 milliseconds rather than 5 seconds.
So, to summarise. Because youse bastards put your ideas about better testability in my head, and got both my internal Unit Test Advocate andthe Refactoring Fairy all excited, I went ahead and wasted several days on refactoring – removing over-testing, adding two more layers of abstraction, adding mocks to pretend to be those new abstraction layers. That was development time that I swore would not be worth it.
Benefits
The result is:
- cleaner code, on the whole. The increased complexity of DI was outweighed by the other simplifications.
- cleaner – and less fragile – tests. While the amount of code in each test-cases was generally much shorter, that was outweighed by the new Mock implementations. There is more test code now, but it is easier to understand.
- a unit-test suite that runs about 2 minutes faster. I run the suite at least half-a-dozen times a day – often more, but I rarely sit around waiting for the result.
Limitations
Improved Bug Detection
I would have liked to say “And I found 5 serious bugs in my original code I had never caught before!”
However, that wasn’t the case. I only discovered one bug:
if number_of_unprocessed_items > 0: slowly_prepare_system_for_processing() process_remaining_items()
The idea of the condition here was to avoid the slow system preparation if there was nothing to do.
However, number_of_unprocessed_items
was a function, not a number. I omitted the parenthesis. It should have been number_of_unprocessed_items() > 0
. Python should know better than to define a greater-than operation on function-pointers. That is the kind of mistake C makes, not Python.
The result was the slow system preparation was always run – a small unnecessary performance hit that my unit-tests could never detect… until I mocked out the slowly_prepare_system_for_processing
and noticed it was being called more often than expected.
No Dependency Injection Framework
I haven’t taken Sunny’s suggestion and found a DI Framework for Python. I suspect the demand for such a beast is lower in a language like Python where classes can be passed as parameters. I did take a look at a major one, and particularly its advertised benefits. There was a large list, and all but one of them (testability) was of zero interest to me on this project.
Poorer Integration Tests
If Class A depends on B depends on C depends on D, then I am now testing:
- C works against mock D.
- B works against mock C.
- A works against mock B.
That’s good unit-test style, right?
Before I was testing:
- C works against mock D.
- B works against C running against mock D.
- A works against B running against C running against mock D.
I was encountering problems that A’s tests were affected when C changed.
But now I have opened myself to another risk. What if one of the many mocks doesn’t match the real classes very well. I am never performing integration tests that make sure my units fit together. Before, I was getting at least some reassurance that most of my code was working together.
I guess I now need to write some systems tests that exercise the entire system (or at least, as much as I can manage without causing the transfer of real money) with an understanding that these tests will be pretty fragile.
More bloody best practice that I would insist upon in any enterprise project, but have avoided on a single-developer, I-am-the-customer-so-no-acceptance-tests-required project.
I blame you.
Comment by Sunny Kalsi on February 21, 2011
I don’t understand. Your problem is “testing made my project better, and it’s encouraging me to do more testing?”
For “System tests” — yeah that’s true. I was alluding to this with my “trivial code” part in the last article. Basically you should be able to run through the system in an “arbitrary” way and be fairly satisfied that it works (as long as all the subsystems get instantiated).
Comment by Julian on February 21, 2011
That sounds like a Sanity Test a.k.a. a Smoke Test. I see that as checking (a) all the files have been included in the installer/deployed correctly, and (b) there isn’t some unlikely but highly embarrassing error that stops the program from running at all.
I’m talking about a different issue: There are always differences in the behaviour exhibited by Mock objects and the live objects. (Many reasons: one or both misinterpret the spec, different behaviour but still in-line with a flexible spec, mock object implementation is not complete, boring old bugs in the mock object and/or bugs in the real object.)
How do I test for bugs that wrong rely on the Mock object behaviour?
That way when my space-ship reaches Mars, I don’t find I was testing against a stub that was using SI units, and the driver was testing the component using square cubits, or whatever they use in backwards countries.
System Test also checks for other items, such as completeness – did the developers actually get around to completing all the requirements? And the often ignored validity tests – does the resulting software actually do what the customer needs, rather than what the requirements state?
I can afford to risk short-cuts there, given the very short distance between the customer and the developer.
Comment by Bork Blatt on February 22, 2011
As someone who is still doing most of his programming in Microsoft ASP Classic 3.0 – using top-down procedural code with rare exceptions – Good writing. I almost understood some of it 🙂
Comment by Julian on February 26, 2011
Bugger. I just encountered precisely the problem I was talking about.
Abstract class Animal declared but didn’t implement a get_all_organs() method.
Concrete class Kangaroo defined a get_all_organs() method.
Mock class MockAnimal, used for testing, defined a get_all_organs() method.
I decide that the clients of the Animal should not be able to get at its internals. That is poor encapsulation.
I remove the declaration from Animal.
I remove the definition from Kangaroo.
I neglected to remove the definition from MockAnimal. No biggee, but the mock now doesn’t match the real object.
I removed from the client a call to get_all_organs… but I missed that there was a second one. Whoops!
So, I tested Kangaroo without invoking the obsolete method and it worked.
I tested the client with MockAnimal, and it worked. It secretly called the obsolete method, and the mock object secretly responded. (This is Python, so calling a method that doesn’t appear in the imported declaration, but does appear in the implementation class, is fine.)
Testing passed, so I pushed it to production.
When the real code ran, with the client calling Kangaroo instead of MockAnimal, it tried to call get_all_organs() and exploded.
Fortunately, the problem was quickly detected and fixed.
As I warned myself, I need fragile functional tests, not just robust unit-tests.