Classless entities in DDD

In Domain-Driven Design (DDD) we try to create a shared model between customers, developers and code. The value in such a model is that everyone speaks the same language, reducing the possibilities of miscommunication and thereby hoping to improve the quality of the software.

One of the implications of this strategy is that any model we create must actually be representable in code. That draws some requirements upon the actual programming language we’re using. For that reason, in a lot of DDD examples an object-oriented language is used, since those languages provide constructs, especially objects, that seem to map pretty well to the structure of a lot of domains. But doing DDD definitely doesn’t constrain you to using an OO language: I see people doing DDD in F# and Eric Evans himself refers to using Prolog for some specific domains in the blue book.

When doing DDD in an OO language, however, there seem to be some established patterns and practices on how to represent specific domain concepts in code, the most common probably being always mapping entities or value objects in the model to a class in code. I’ve seen this former pattern cause some problems from time to time, and I personally don’t always follow that one as strictly anymore.

Let’s take the Purchase Order example from the blue book. The problem is that a given purchase order (a domain entity) can only have a maximum total value. This total order value is determined by the PO’s purchase items (also entities), consisting of a quantity, price and a reference to the part. We model this using a Purchase Order aggregate which is responsible for maintaining this invariant. So how do we represent this in code?

A common way of doing this is providing the following API:

[TestMethod]
public void when_changing_the_quantity_of_an_item_we_cannot_exceed_the_max_value() {
    Order order = CreateOrder(ofMaximumValue:10, ofCurrentValue:9);
    Part part = CreatePartOfValue(1);

    PurchaseItem item = order.NewItem(part, quantity:1);

    try {
        order.ChangeQuantity(item,newQuantity:2);
    }
    catch(MaximumOrderValueExceeded) {
        return;
    }

    Assert.Fail("MaximumOrderValueExceeded was not thrown when exceeding the maximum order value");
}

In which we have represented the order and its items as classes in our code. There is a problem with this particular solution, however: as an API consumer we get a reference to the PurchaseItem, which allows us to break the aggregate’s invariant by directly modifying the Quantity property on the PurchaseItem. This is undesirable, because someone less familiar with the code might actually do this. Of course, there are technical ways of preventing this, but in general those ways tend to clutter the code and it’s just not really necessary.

We can do better by changing the API as follows:

[TestMethod]
public void when_changing_the_quantity_of_an_item_we_cannot_exceed_the_max_value() {
    Order order = CreateOrder(ofMaximumValue:10, ofCurrentValue:9);
    Part part = CreatePartOfValue(1);

    PurchaseItemId itemId = order.NewItem(part, quantity:1);

    try {
        order.ChangeQuantity(itemId,newQuantity:2);
    }
    catch(MaximumOrderValueExceeded) {
        return;
    }

    Assert.Fail("MaximumOrderValueExceeded was not thrown when exceeding the maximum order value");
}

Here we changed from passing around entities to passing around entity Ids, which means that we cannot invoke operations on PurchaseItems directly anymore (besides not handing out references, we also internalized its class to be sure). Now, what’s interesting is that from a consumer point of view, you don’t care whether the Item is implemented as a class or something else anymore. I think this is nice property in itself, because it hides implementation details, but besides that it actually frees up alternative implementations for the entire aggregate, meaning we’re not tied to a class per entity anymore (though we can still implement it this way, of course).

Now, would there be a reason why we’d want to do that? The answer is it depends. Like I said in the intro, the goal of DDD is to create a shared model of the domain and we need to be able to represent that model in code. Now, since we have multiple options of representing a concept in code, it becomes a matter of picking a representation that models the domain most clearly from code, and is therefore the more desirable model.

Does an object fit our model of a purchase item satisfactory? I think in this case, the purchase item is more like a data structure than an object: it has data, but doesn’t really have any behavior (since that has to go through the aggregate root). So no, looking at it from the (OO) perspective where an object should have behavior working on its private data, I don’t think an object matches the domain concept really well in this case. Of course, in practice, a data structure is usually represented as a class as well (unfortunately), which then does actually fit pretty well to the model.

An alternative implementation might use a simple Tuple, which would go as follows:

using PurchaseItem = Tuple<PurchaseItemId,PartNumber,int>;
class Order {
    Dictionary<PurchaseItemId,PurchaseItem> _items = new Dictionary<PurchaseItemId,PurchaseItem>();

    public void ChangeQuantity(PurchaseItemId itemId,int newQuantity) {
        if(!CheckOrderValueWithNewQuantity(itemId,newQuantity)) {
            throw new MaximumOrderValueExceeded();
        }
        _items[itemId] = UpdateQuantityForItem(itemId,newQuantity);
    }
    ...
}

This model conveys pretty clearly that there is no behavior associated with the PurchaseItems themselves, which is what we actually want to model. Also, you won’t be inclined to actually put any behavior on it, since you can’t.

In the end, working with tuples is probably gonna make the code a little less readable in C# (where are my record types?) so I might actually factor it out into a class, but in my opinion that’s a programming language driven compromise in model accuracy.

Conclusion

The primary reason I wrote this post is that I see the one-class-per-entity solution being applied in a lot of situations where it can actually cause harm, such as in the example where you could violate the aggregate’s invariant. In any other situation, we would think twice before exposing an object in that way to its consumers. But since we think this is the way that things are done, we end up having these kind of problems. We shouldn’t be doing that, and we should always stay critical to our own and other people’s solutions; they might be wrong or simply not apply to your context. I’d be happy to hear what others are thinking about this, so please use the comments.

BTW. It often also works the other way around: people trying to cram all the entity’s logic into 1 class. This is a way bigger problem, because it really clutters the aggregate’s code and you’re missing opportunities to abstract. Thus it perfectly valid to have classes in your domain layer that are not in your model. I’ll try to write about that later.

Identity based throttling in ASP.NET MVC

For one of our projects, we recently got sort of DoS’d by one of our client’s own (paying) customer. Somehow the rate of requests coming from this particular user increased about 500x, bringing down part of our system. We’re not sure what exactly happened, it might have been a bug at our side or due to the hardware/software configuration at the specific customer. Anyway, we needed to do something to prevent this problem from happening in the future, whatever the cause. So we came up with the idea of introducing throttling, and specifically throttling based on the ASP.NET username (identity).

The basic idea is that if you, as a specific user, hit our service more than a predetermined number of times within a specific period, you’ll be shown a message that you’re being throttled.

In this post I’ll describe how we implemented this. We also open sourced the solution, which you can get on nuget or the code via GitHub.

Detecting the overload

The first thing we need to do is keep track of the number of incoming request per user. As we’re using ASP.NET MVC, we can easily do this by creating an ActionFilter. At this moment we don’t want to throttle on non-controller/actions, such as static resources. If we did, we could’ve used HTTP Modules.

The ActionFilter basically keeps a the count per user in a ConcurrentDictionary, and increments the counter whenever an authenticated user hits it.

public class UserThrottlingActionFilterAttribute : ActionFilterAttribute{
    ConcurrentDictionary<string,ConcurrentLong> _throttlePerUser = new ConcurrentDictionary<string,ConcurrentLong>();

    public int RequestsPerTimeStep{get;set;}

    public override void OnActionExecuting(ActionExecutingContext filterContext) {
        if(!filterContext.HttpContext.Request.IsAuthenticated) {
            return;
        }
                
        var username = filterContext.HttpContext.User.Identity.Name;

        var counter = _throttlePerUser.GetOrAdd(username,ConcurrentLong.Zero);

        var lastValue = counter.Increment();

        if(lastValue <= RequestsPerTimeStep) {
            return;
        }

        StartThrottling();
    }
}

ConcurrentLong is just a wrapper around a long, which we can increment in a threadsafe way and have a reference to:

class ConcurrentLong {
    long _counter;

    public long Increment() {
        return Interlocked.Increment(ref _counter);
    }

    internal static ConcurrentLong Zero {
        get {
            return new ConcurrentLong();
        }
    }
}

To reset the counts, we create and assign a new ConcurrentDictionary every time a request is made and it’s been longer than TimeStep since the last flush. By doing it as part of the request, we don’t need a separate thread for flushing the data. We place a lock around this part of the code, though I’m not sure its absolutely necessary, but I didn’t feel like hurting my brain over it too much:

public TimeSpan TimeStep{get;set;}
readonly object _lockObject = new object();
DateTime _lastFlush = DateTime.Now;

public override void OnActionExecuting(ActionExecutingContext filterContext) {
    if(!filterContext.HttpContext.Request.IsAuthenticated) {
        return;
    }
                
    FlushThrottleCounterIfNecessary();

    ...
}

private void FlushThrottleCounterIfNecessary() {
    lock(_lockObject) {
        if((DateTime.Now - _lastFlush) < TimeStep) {
            return;
        }

        _throttlePerUser = new ConcurrentDictionary<string,ConcurrentLong>();
        _lastFlush = DateTime.Now;
    }
}

Now that we’re able to detect that a user is overloading the system, the next step is to actually throttle the user.

Throttling the user

To throttle the user we basically used a variation on an answer to a question about throttling in ASP.NET on SO. This solution leverages the ASP.NET caching feature to track throttled users for ThrottleBackoff time:

public TimeSpan ThrottleBackoff{get;set;}

void StartThrottling(string username) {
    HttpRuntime.Cache.Add(
        username, 
        true,
        null,
        DateTime.Now.Add(ThrottleBackoff),
        Cache.NoSlidingExpiration,
        CacheItemPriority.Low,
        null
    );
}

To actually let the user know that it’s being throttled and not continue on to the action, we need to set the ActionResult from within OnActionExecuting:

public override void OnActionExecuting(ActionExecutingContext filterContext) {
    if(!filterContext.HttpContext.Request.IsAuthenticated) {
        return;
    }
            
    var username = filterContext.HttpContext.User.Identity.Name;

    if(HttpRuntime.Cache[username]!=null) {
        var result = new ViewResult {
            ViewName = "Throttling",
            ViewData = new ViewDataDictionary(),
        };

        filterContext.Result = result;
        filterContext.HttpContext.Response.StatusCode = 429; // too many requests
        return;
    }

    ...
}

Here we return a special view “Throttling” which contains the error message. Also we reply with HTTP Status Code 429, Too Many Requests.

Conclusion

That’s pretty much it. If you’re looking for a way to throttle your users in an ASP.NET MVC app, don’t look any further. I put the source on GitHub, it comes with a sample MVC app and a jmeter test script for experimentation.  You can also install it via nuget: Install-Package asp.net-user-throttling.

Some final notes:

  • It should be clear that this is not a complete (D)DoS protection strategy, it’s actually much more a way to protect yourself from someone accidentally overloading your system.
  • Since we’re using action filters, this will only work for URLs that actually map to a controller/action pair.
  • In a multi-node situation, there will be a cache and counter per node, so a user might easily do more requests than the amounts you specify. For us, this wasn’t a particular problem because it’s about orders of magnitude.  More serious though, is that you might be throttled on 1 node and not on the others. This would give an unstable and annoying experience to the users. We solved this by actually setting a cookie that throttling is going on and when that’s present we also reply with the you’re being throttled page.
  • In our solution, we allow the user to override its throttling, also via cookie. This is again because we’re protecting against accidental errors. So if a user is being throttled while doing legitimate work, he can actually ignore the throttling.

Infi Coding Dojo – Object Calisthenics

The dojo

A couple of weeks ago we held the first ever Infi coding dojo. The basic idea of a coding dojo is to get together with a bunch of coders and practice some new coding skills that you’re eager to learn, but don’t normally get around doing. Besides this being a cool goal in itself, we also thought it would just be plain fun. So we decided to organize one at our company, and after surveying some colleagues the dice was thrown to let it be about object calisthenics. We ended up with a bunch of colleagues and friends and had a great time. In this blog post I’ll dive into some of the problems we faced, and how we resolved them (or not).

The challenge was to build a simple tennis-scoring program while applying the object calisthenics rules. We picked tennis-scoring because we thought it was a fairly well understood domain (which was proved wrong pretty quickly..) and it was probably small enough to fit into a 2-3 hour session while still posing the challenges you’ll face when applying object calisthenics.  I put up a description of the challenge on github, so you can try doing it yourself.

Following the rules

Object calisthenics is about 9 or 10 rules (depending on the article you read), which are supposed to make your code more object oriented and thereby ‘better’. I first learned about it by Fred George at BuildStuff 2013 and really got me thinking about what OO is all about. In this blog, I’ll talk about our experiences with applying the rules, but if you’d like to read more theory, read the original article by Jeff Bay.

Rule 1. One level of indentation per method

This one was fairly easy to attain. Just whenever you were making a mess, your usual strategy would be to factor out the violating parts into a new method. One interesting side-effect of this was that this sometimes also removes the need of comment in front of the nesting; the name of the new method would simply take over this function, which I like.

One of the more interesting cases was whether a switch statements constitutes multiple levels of indentation. For example, one of the solutions had this code:

public static string Print(GameScore gameScore)
{
    switch (gameScore)
    {
        case GameScore.Zero:
            return "0";
        case GameScore.Fifteen:
            return "15";
        case GameScore.Thirty:
            return "30";
        case GameScore.Fourty:
            return "40";
        case GameScore.Advantage:
            return "A";
    }
    return null;
}

Now, for some reason some people felt the return statements weren’t really violating this rule, and then argued that you could remove the violation by decreasing the level of indentation of either the case or the return statement. But I guess you could do that to remove any indentation, so I, as a good sensei, ruled it a violation (this is also because Fred George forbade them in his talk, as he did with if statements..). But it raises an interesting point about the rules: sometimes it isn’t entirely clear what their goals are, making good arbitration hard. Switches then would be eliminated using either Dictionaries/Hashmaps or ifs with early returns:

public static string Print(GameScore gameScore)
{
    if(gameScore==GameScore.Zero) {
        return "0";
    }

    if(gameScore==GameScore.Fifteen) {
        return "15";
    }

    ...

    return "A";
}

or

static GameScoreMap = new Dictionary<GameScore,String> {{GameScore.Zero, "0"}, {GameScore.Fifteen, "15"} ...} //note this does not actually work, should be done from static ctor

public static string Print(GameScore gameScore)
{
    return GameScoreMap[gameScore];
}

Rule 2. Don’t use the else keyword

This one also wasn’t that hard to figure out, it was mostly resolved using the same strategies as above, that is: early returns or using a dictionary. One of the solutions applied the state pattern (which interestingly enough, introduced a violation of rule 9: don’t use setters).

Again, the question was raised what the actual goal of the rule was. Most people felt the early return strategy was kind of a hack since there is still an implicit else. On the other hand, early returns in itself have the interesting property that your mental image of the function only needs to consider 2 flows at max, which I personally find very valuable. Therefor, I think early returns are a valid strategy to apply this rule.

Rule 3. Wrap all primitives and strings

This was one of the more interesting rules. While it’s actually pretty easy to wrap all primitives in their own type, it’s kind of hard to do this AND comply to rule 9 (don’t use getters and setters). I’ll illustrate the problem based a simple clock which keeps minutes and seconds (this kind of resembles the problem found in the tennis game, but keeps the terminology simpler):

    class Clock {
        private Minutes _minutes = new Minutes(0);
        private Seconds _seconds = new Seconds(0);

        public void Tick() {...}
    }

    struct Minutes {
        int _minutes;
        public Minutes(int minutes) {
            _minutes = minutes;
        }
    }

    struct Seconds {
        int _seconds;
        public Seconds(int seconds) {
            _seconds = seconds;
        }
    }

An external source calls the tick method on the clock every second. Now, how do we implement tick? A straightforward solution would be

public void Tick() { 
    _seconds.Increment();

    if(!_seconds.Equals(new Seconds(60))) {
        return;
    }

    _seconds = new Seconds(0);
    _minutes.Increment();
}

Where Seconds.Increment and Minutes.Increment are implemented by incrementing their integer field. The problem in this piece of code is the

    if(!_seconds.Equals(new Seconds(60))) {

line, because this actually constitutes checking the state of _seconds and making a decision based on it, which is exactly what rule 9 is supposed to prevent. So, even though we’re not really using a getter here, I think it’s still sort of a violation of rule 9.

A solution in accordance with the rules would be something like this

class Clock {
    ...
    public void Tick() { 
        _seconds.Increment(_minutes);
    }
}

struct Seconds {
    ...

    public void Increment(Minutes minutes) {
        _seconds++;
        if(_seconds != 60) {
            return;
        }
        _seconds = 0;
        minutes.Increment();
    }
}

Which most of us didn’t like that much either, because we felt Seconds shouldn’t really know about Minutes. Also, this gets worse if we add hours, because then we need to add Hours to the Seconds.Increment signature as well:

class Clock {
    private Hours _hours = new Hours(0);
    private Minutes _minutes = new Minutes(0);
    private Seconds _seconds = new Seconds(0);

    public Clock() { }
    public void Tick() { 
        _seconds.Increment(_minutes, _hours);
    }
}

struct Hours {
    ...
}

struct Minutes {
    ...
    internal void Increment(Hours hours) {
        _minutes ++;
        if(_minutes != 60 ){
            return;
        }

        _minutes = 0;
        hours.Increment();
    }
}

struct Seconds {
    ...

    public void Increment(Minutes minutes, Hours hours) {
        _seconds++;
        if(_seconds != 60) {
            return;
        }
        _seconds = 0;
        minutes.Increment(hours);
    }
}

So now Seconds needs to also know about Hours, which most of us feel just isn’t right (but might be caused by our collective procedural mindset). This solution, by the way, is in violation of rule 8 (no more than two instance variables). Working around that requires you to setup an even more elaborate callback structure, which I think definitely complicates matters worse.

So, this one is really undecided. I think I’m fine with comparing primitive’s values, but I think it’s a slippery slope. I’d love to hear some discussion or examples in the comments.

BTW. In a tennis game this occurs when one of the players completes a game or a set.

Rule 4. First class collections

I don’t think this rule actually applied in any of the final solutions, but there were some WIP solutions that violated this rule and corrected it later. One of the cases where there was a TennisMatch score which had 2 dictionaries, one for the set score, and one for the game score. These where refactored in their own classes SetScore and GameScore. The corresponding methods for incrementing the applicable score for a specific player was then deferred to those objects, having GameScore call back into the TennisMatch when the game was complete. For an example, see: https://github.com/edeckers/ObjectCalisthenicsCodingDojo/blob/master/TennisMatchCodingDojo/SpelerScore.cs

Rule 5. One dot per line

I think this one pretty much followed from other rules, especially rule 9 (no getters). We found that most methods had a void return type and since there were no getters, this one was actually pretty hard to break.

Rule 6. Don’t abbreviate

We didn’t really encounter any problems with this one, except maybe for the occasional for loop iterator. But I don’t think any of those made a final solution.

Rule 7. Keep all entities small

Due to the scope of the challenge this wasn’t really a problem. Also, I suppose this one is going to be hard to break due to the other rules. If anyone has any experience with this, please let me know.

Rule 8. No classes with more than two instance variables

Again one of the more interesting rules. When applying this rule, one of the design issues that comes among what dimension you need to segregate your classes. In the case of our tennis match you can imagine having the follow class:

class TennisMatch {
    GameScore _player1GameScore;
    SetScore _player1SetScore;
    GameScore _player2GameScore;
    SetScore _player2SetScore;
}

Which violates this rules, so we need to split out some classes. Here we saw two directions, segregate by player:

class TennisMatch {
    PlayerScore _player1Score;
    PlayerScore _player2Score;
}

class PlayerScore {
    GameScore _gameScore;
    SetScore _setScore;
}

or by score:

class TennisMatch {
    ScoreInGame _scoreInGame;
    ScoreInSet _scoreInSet;
}

class ScoreInGame {
    GameScore _player1Score;
    GameScore _player2Score;
}

class ScoreInSet {
    SetScore _player1Score;
    SetScore _player2Score;
}

It turns out that segregating by score worked best since the game’s rules involve comparing scores of similar kind. For example, when deciding if a game is complete when you are on 40 in a game you need to check if the opponent’s score is not 40 or on advantage. Checking your opponent’s score is hard if it’s a couple of objects away and using object calisthenics.

I particularly liked this rule because it forces you to actually put data where it belongs. In the above situation having it on the player makes no sense except that it might feel like it’s owned by the player and should therefore be on it in. In the actual solution it of course still belongs to the player, but it’s coupled to it via identity, not reference, leading to way less coupling.

Rule 9. No getters/setters/properties

This is probably the most important rule, as well as the hardest. It forces you to really put logic where it belongs. I personally interpret this rule as: don’t make decisions on somebody else’s state, but as shown in rule 5, that becomes really hard when you involve primitives and interpret comparing (the entire object) as checking state.

Another issue that came up was whether returning a value from a (command) method counts as a getter. For example, a boolean was returned to indicate if a game was complete. Then, the calling method would increase the games for the player if that returned true. I’m a big favorite of command-query seperation myself, so usually don’t code that way. But I think this should probably count as a getter. I wonder what you guys think?

In fact, when you strictly apply Tell, don’t ask, you’ll find that there is no need for any (public) query method. You’ll always invoke operations on other objects, asking them to do stuff, and potentially call back into you. We found that this introduces some circular dependencies between classes. While it’s a common conception that this is a bad thing (a smell), there’s not really a way around this. So, also not really sure about this one.

Another issue is how to do presentation. In fact, if you look at the our example project, you’ll find there is a method returning a string representation of the score. We use this both for testing and display purposes. I think this is probably alright, but it might be seen as a violation of TDA. You could quite easily fix this changing the method to take a TextWriter and have the method write to that, though. A more interesting question that comes up is how to change formatting: what if we want to change the order in which the score are written, or want to display only the current game score. Now, this is actually a very interesting question, because if you think about it a bit more, you’ll come to the conclusion that UI and business logic will always be tightly coupled. This also implies that the business logic will need to now a thing or two about the UI. Now, this really challenges the common belief that business and UI code should and can be strictly separated. I might write more about this later, but for now I’ll refer you to an article by Allen Holub, which also addresses this issue.

Again, it would be nice if the goal of this rule would be stated somewhat more explicitly, so we can actually derive answers to these questions ourselves.

Conclusion

All in all, we had great fun doing this coding dojo. It’s always good to try new things and have discussions with other developers. With regard to object calisthenics, I think it actually pushes you to better OO design. It forces you to think about encapsulation in ways you probably didn’t before, and that will probably lead to better code. However, I strongly felt the rules as they are formulated right now are open for too much interpretation, which can also guide/force you in the wrong direction. I think it would be really helpful if there was a follow up on the original article. I must say, however, that it could also be us just interpreting the rules in the wrong way.

As said, you can find a lot of code (both the problem and the solutions) on github. Special thanks to all participants and I really hope we can have some cool discussions in the comments. I am particularly interested how other people are solving the issues demonstrated in rule 3 and 9.

REST zonder media types is geen REST

Een veelal onbegrepen onderdeel van REST is de rol van media types. In deze post ga ik iets dieper in op deze media types en leg ik uit waarom een REST API waarvoor geen media types gedefinieerd zijn nooit een REST API kan zijn.

Om aan te geven hoeveel verwarring er is over de rol van media types in REST APIs: een van mijn helden, Fowler, zegt bijvoorbeeld dat het prima mogelijk is om een REST API aan te bieden zonder dat je gebruikt maakt van media types, terwijl Fielding juist zegt dat het gebruik van media types een voorwaarde is voor REST. Een van de redenen voor deze verwarring is dat REST verschillende dingen voor verschillende mensen betekent. Voor sommigen betekent het gebruik van HTTP verbs in je API, voor anderen betekent het het gebruik van JSON als data-uitwisselings formaat, voor weer anderen betekent dat je de client en server kunnen onderhandelen over het uitwisselingsformaat. Voor Fielding betekent het een architectuurstijl waarmee je applicaties op internet-schaal kunt bouwen. Hierbij bedoelen we met schaal niet (alleen) performance, maar ook de uitdagingen die ontstaan doordat er meerdere organisaties en mensen betrokken zijn. Nu heeft Fielding de term REST geintroduceerd/gedefinieerd dus als je je API RESTful wilt noemen, zul je wat mij betreft moeten voldoen aan de eisen die hij daarvoor gezet heeft.

Als je dit standpunt aanbrengt in discussies hoor ik vaak het argument “dat dat wel erg puristisch is”. Nu is het opzich waar dat purisme in softwareontwikkeling niet altijd behulpzaam is, maar te vaak wordt dat argument gebruikt om af te wijken van de standaarden. Een veel voorkomend voorbeeld is dat organisatisch die starten met SCRUM oid de methode iets aanpassen “omdat dat beter past bij de organisatie”. De holistische aard van dergelijke processen wordt daarbij dan genegeerd, en uiteraard faalt de adoptie van het proces. Ik kan hier nog even over doorgaan, maar dat is voor een latere post.

Eenzelfde situatie ontstaat bij REST. Als je het belang van media types niet inziet, gebruik je REST waarschijnlijk voor de verkeerde reden. Voor Fielding is REST een architectuurstyle die gebruikt kan worden voor “distributed hypermedia systems” die qua levensduur gebruikt worden “on the scale of decades”. De eerste vraag die je moet stellen als je een RESTful API wilt aanbieden is of je een dergelijk systeem gaat bouwen. Waarschijnlijk niet. Veel van de APIs die ik tegenkom zijn voor het aanbieden van generieke, goed gedefinieerde diensten aan externe partijen. Denk hierbij aan koppelingen met e-mail en payment providers, weerdiensten, reserveringssystemen, etc. Dit zijn over het algemeen een paar calls die een zeer specifieke functionaltieit aanbieden en die qua interface ook zeer stabiel zijn. RPC is zeer geschikt voor dergelijke koppeling. De constraints van REST zijn daarentegen overdreven voor dergelijke calls, voornamelijk omdat het gebruik van caching/intermediates voor dergelijke calls vaak niet eens wenselijk is en er nauw tevens nauw contact zal zijn tussen leverancier en consumer, waardoor individual evolvability van veel minder belang is. En dit zijn juist de twee grote voordelen van REST.

Maargoed, stel dat je nu toch hebt besloten om een RESTful API aan te bieden, kan dat dan kwaad? Het antwoord is zonder meer: Ja, het kan kwaad. En wel om de reden dat een REST API je veel meer development effort (en dus geld) zal kosten, dan een vergelijkbare API via bijvoorbeeld SOAP. Waarom? Omdat SOAP juist een standaard is voor het doen van RPCs. Als de functionaliteit die je aan wilt bieden dus veel weg heeft van RPC, is dit een zeer natuurlijke oplossing. Het enige wat je zult moeten documenteren is de semantiek van je API. Wat betekenen de nouns en de operaties, maar niet hoe er gepraat moet worden tussen client en server en hoe fundamentele data structuren er op de lijn uit zien.

Omdat SOAP een standaard is is er bovendien veel tooling en support op de markt. Met behulp van WSDLs en IDEs is het zeer eenvoudig om een koppeling te bouwen tussen een SOAP client/server. Visual Studio kan bijvoorbeeld uit jouw service class/interface een WSDL genereren die vervolgens door een PHP client geconsumed kunnen worden, waarbij je in 10 minuten een werkende oplossing hebt. Dit in contrast met REST, daar zul veel meer moeten documenteren, namelijk niet alleen de semantiek, maar ook hoe je door de API heen navigeert, en wat voor operaties er mogelijk zijn.

Er wordt soms gedacht dat in een REST API de HTTP verbs en status codes al een vooraf gedefinieerde betekenis hebben voor elke willekeurige URI/resource. Dit wordt ook vaak als voordeel aangehaald, want “elke client die HTTP spreekt kan met de API communiceren”. Als je er een beetje over nadenkt, zul je echter al snel tot de conclusie komen dat dit niet waar kan zijn. Hoe kun je nu met iets praten waarvan je niet weet wat het is, wat het kan of wat het doet. Het onderscheid ligt ‘m in horen vs verstaan. De client en server zullen elkaar wel horen, maar elkaar niet verstaan. Om elkaar ook te verstaan heb je meer informatie nodig. De toegevoegde waarde van het gebruik van uniforme verbs en status codes is dat intermediaries weten hoe ze de requests/result moeten behandelen, bijvoorbeeld dat een GET safe is, maar een POST niet. En dat een 4xx return code een client fout betekent. Overigens zijn deze definities onderdeel van HTTP, niet zozeer van REST. REST praat alleen over een “uniform interface” en het gebruik van de verbs en status codes in HTTP is hier een implementatie van.

En daarmee komen we bij de crux van deze post aan, namelijk dat je geen REST kunt doen zonder media types. De media types zijn namelijk de documentatie, en ook de enige documentatie van je API. Hierin staat hoe je door je API navigeert, wat de mogelijkheden zijn, en wat voor semantische betekenis alles heeft. Een client en server zullen daarom nooit iets gedaan krijgen zonder dat ze allebei het media type kennen. Als je het niet gelooft nodig ik je uit om bijvoorbeeld eens de AtomPub RFC er bij te pakken. Kijk bijvoorbeeld naar sectie 4.3 over de beschikbare verbs en wat ze doen, en sectie 4.4 dat de overige verbs geen betekenis hebben volgens die RFC. Ook Fielding zegt hier iets over in de context van HTML: “anchor elements with an href attribute create a hypertext link that, when selected, invokes a retrieval request (GET) on the URI corresponding to the CDATA-encoded href attribute.”.

Als er beslissingen gemaakt worden op basis van andere informatie, zoals bijvoorbeeld de URI structuur, dan wordt dit ook wel out-of-band informatie genoemd. Het gebruik van dergelijke informatie is niet RESTful, omdat de server onafhankelijk moet kunnen doorevolueren.

Als er in een API bijvoorbeeld gebruik van application/json of application/xml als Content-Type (zonder Link: <location>; rel=”profile”), dan heb je out-of-band informatie nodig om de informatie te interpreteren. Het probleem met deze types is namelijk dat deze niks zeggen over hun inhoud (behalve dat het geformateerd is als xml/json) en dus ook niet hoe het door de client geinterpreteerd moet worden. Ook hier wordt vaak de URI gebruikt als indicatie wat de data is, maar dat is dus niet RESTful.

REST is dus helaas geen silver bullet voor zelfdocumenterende APIs. Het geeft wel aan hoe je moet documenteren, namelijk met media types. Alle documentatie wordt vastgelegd in het media type, en als je deze niet aanbiedt is het dus onmogelijk om een RESTful API te bouwen. Het definieren van deze media types hoeft niet per se veel werk te zijn, maar het is altijd meer werk dan het documenteren van een SOAP API, omdat SOAP al een standaard biedt voor het aanroepen van acties en de representatie van well-known datatypes.

Dankzij deze documantie effort zal het bouwen van een REST API altijd meer werk zijn dan een corresponderende API in SOAP. Je zult dus een goede reden moeten hebben om dit te doen (“distributed hypermedia systems”, “on the scale of decades”). En als je die reden dan hebt, dan moet je ook de full monty gaan, dus inclusief media types (sterker nog, daar moet je mee beginnen), om die voordelen ook daadwerkelijk te bereiken. Doe je dit niet, dan zul je als resultaat een RPC API hebben, waar niemand standaard aan kan koppelen en je heel veel werk gedaan hebt wat ook al door frameworks/libraries opgelost wordt, namelijk RPC.

Het moraal van dit verhaal is dat hoewel het tegenwoordig hot is om RESTful APIs te bouwen, het verstandig is om hier terughoudend mee te zijn. De voordelen die het biedt zijn namelijk lang niet altijd nodig, terwijl het sowieso meer tijd, geld en energie gaat kosten. En dat is waarschijnlijk beter gespendeerd aan andere dingen.

PS. Als je niet aan SOAP wilt omdat je liever met JSON praat, zoals in de browser of op mobile devices, dan zijn er nog andere opties. Zo biedt WCF een JSON binding en zijn er SOAP to JSON proxies.