Posts in "Testing"

Using syntax highlighting to improve the readability of your tests

One thing I obsessively focus on is test readability. The reason being that it’s often very hard to understand what a test is really about; usually the signal-to-noise ratio in tests is very low, and that makes it hard to understand what is being tested. So I continuously try to improve the readability of my tests by experimenting with different styles. Yesterday I was working with my colleague Sander, and we discovered yet another trick that I think can help: syntax highlighting.

Consider this piece of code:
It tests for certain SEO properties (the contents of the robots meta tag in the HTML) based on the current status of the specific page.

Now, I think these tests already have a relatively high signal-to-noise ratio. The thing is at a pretty high level of abstraction, and you can probably follow what’s going on as non-technical person. It is, however, less clear what the difference are between these two tests. In fact, only 8 characters are different, out of a total 312 characters. It’s really hard to spot what those differences are when you start reading this code.

Now consider this form:


While the number of changed characters is exactly the same, I think both the original intent and the differences between the tests are way more clear. When I get back to this code, my eyes will immediately be drawn to the strings, which happen to also capture the essence of the tests, which is what we’re after.

Improving refactoring opportunities by reducing the amount of public contracts

Last week I had an interesting conversation with some of my BuildStuff friends about every developer’s most favorite topic: documentation. The specific problem was how to keep documentation in sync with the code and how to handle that. This resulted in João referring to the docs of a project he’s working on, logstash. In logstash most (all?) the documentation is generated from code and they have success using that method. Still, I think everybody knows of situations where your xmldocs, phpdocs or just plain comments diverge from the actual code that it describes. This will obviously cause problems when you generate docs off of it. So why does it work for logstash?

Well, we came up with the notion that what you actually can do is document code/apis/whatever that are publicly available. The reason being that this stuff is already being used by other people, and therefore you already shouldn’t be breaking it. When you change something you need to make sure that other people’s stuff keeps working. Therefore, in these scenarios, documentation will not get out of sync since the code that is being documented can simply not change.

In the case of logstash, they better make sure they are backward compatible with older versions (at least within a major version), otherwise nobody would ever upgrade or even use it; it would be too much of a hassle reconfiguring on every release. For that reason I think it actually is viable to generate documentation for things like configuration files: they shouldn’t change in a breaking way anyway.


So what does this have to do with refactoring? Well, basically, everything. It turns out that the only code you can refactor is code that nobody else directly depends on, that is, code that is not-public. The key being the word public: as soon as a piece of code becomes public you introduce a contract with the (potential) user(s) of this code. It doesn’t really matter if you expose it via a class library, API or even a user interface: as long as it’s publicly exposed, you have a contract with its users. That also means that from now on, you won’t be able to refactor the interface of whatever you’ve exposed.

One example I really like to illustrate this point is the bowling game example from Uncle Bob’s Agile Software Development: Principles, Patterns, and Practices. In this example, Uncle Bob implements a game of bowling doing TDD and pair programming, and in the course of implementing it, he lays out their thinking patterns. I’m not going to recount the entire story here, but like to focus on one of the acceptance tests (for more code see

public void test_example_game() {
    int[] throws =  new int[] { 1,4,4,5,6,4,5,5,10,0,1,7,3,6,4,10,2,8,6 };

    var game = new Game();

    foreach(var throwScore in throws) {


Their goal is to make this test pass, and they proceed by thinking about (actually exploring) a couple of ways of designing a solution. They start with a solution that uses Frame and Throw classes (nouns from the domain of bowling) next to the Game class, but can’t really find a way to make that work nicely, so they decide not to use those classes. The final solution they presented was very simple and used only two classes. What’s important here is that from the point of view of a consumer, it doesn’t really matter how it’s implemented. The consumer, as defined by the acceptance tests, only cares about the Game class. The way you implement it doesn’t really matter, as long as you adhere to the contract specified by the acceptance test.

No problems so far, but what I see happening a lot is that when someone comes up with an implementation that does use a Frame and/or Throw class, those classes are also made publicly available. This happens, for example, when you reference them in a unit-test driving the development. Doing that, however, implicitly introduces a contract you now need to maintain. And this contract has nothing to do with the behavior of the system, but only with the specific implementation. As a side effect, you’ve now significantly reduced the opportunity to refactor because the implementation that doesn’t use the the Throw and Frame classes is not available as a solution anymore. Instead, you now need to maintain three classes and its methods (at least 4, I guess), instead of just the one class with 2 methods. In a real project, this build-up of contracts becomes a burden really fast, and it can completely kill the adaptability of your application, because you have to respect all the existing contracts.

Testing & mocks

All this stuff is also very tightly related to testing with mocks. I’ve personally stepped into the trap of testing every interaction between collaborating objects more than once. The canonical way of testing those interactions is using mocks and substituting one of the collaborators by a mock of its interface. The problem with this strategy is that every class and interface used to implement some sort of behavior must become public:


On the left, we have 6 public interfaces, while on the right we have just 1. This results in us having to maintain 6 contracts on the left, while only 1 on the right. In my experience, maintaining 1 contract is just easier than 6. DHH, Kent Beck and Martin Fowler talked about having tests coupled to implementation in their Is TDD dead? sessions, and this is exactly what that is. By mocking every interaction, you completely coupled tests to implementation, and have no room for refactoring. You experience the binding of this contract directly when you refactor the code and also need to refactor the tests. That’s not how it should work.

Also, from an integrated product point of view you need to setup elaborate IoC container configurations to get your app running in the right way, and it’s the client that makes those decisions. All this just doesn’t feel like information hiding to me.

So, to be honest, in the classicist vs mockist discussion, I really think you should lean strongly to classicist. I do use mocks, but only for stuff that involves the external world, or is otherwise slow, such as a mail server, database or web service. This is something, however, that I continuously re-evaluate (can I run an in-proc mail server/database, for example).


I call the contracts of my system the Public Surface Area of my system. For the reasons described above, I always try to optimize for an as-small-as-possible (as little as possible contracts) public surface area. By reducing the surface area, I have less contracts to maintain, which gives me more opportunities to refactor. In general, reducing surface area gives me a system that is better maintainable and testable and therefore has a better design.

BTW. Note that there can be multiple contracts in the same system calling upon each other. For example: a UI layer may call into an application layer. Since UI testing sucks, we rather test the application layer independently, thereby introducing new contracts besides the contracts implicit in the UI. If UI testing was easy, I might not want the additional burden of maintaining a separate set of contracts for the application layer.