Posts in "Uncategorized"

Uncategorized

New design

June 28, 2015 0 Comment Freek Paans

Hi all,

I updated the blog with a new design. The old one was just the default wordpress theme, and I thought it would be nice to have something a little bit more custom. I personally like big fonts when reading blogs, so that’s what I went for. If you have any feedback, please let me know 🙂

Cheers,

-Freek

Architecture Uncategorized

Special case pattern, primitive obsession and accidental design

March 11, 2015 0 Comment Freek Paans

I define accidental design as a design that’s present in a software system, but was not consciously chosen for. Instead, it’s driven by other factors, such as tooling, co-workers, past experience, existing code and a whole lot of other reasons. Two of the most common cases I encounter have to do with the special case pattern and primitive obsession.

For this blog post I’ll use examples from the Payroll Case Study.

Special case pattern

The special case pattern is all about null checks. Everybody’s written code like this:

The problem with this code is that I have no clue about the semantic meaning of the if statement. It could be that that an Employee with the specified Id wasn’t found, or it could mean that it was found but just isn’t active anymore, or both. The special case pattern is a way to make your intent more explicit:

It’s more clear what we’re checking here: whether the employee was found or not. Personally, I prefer to decouple the Employee interface from having to know about NotFound states and such, so I’d have GetEmployee return an GetEmployeeResult:

Anyway, what does this have to with accidental design? Well, the entire reason we can check for null is because we’ve defined Employee as a reference type. That means a variable of that type can actually be null. Had it been a struct, or had the language not supported null for reference types, we would have definitely designed it in a different, probably more explicit, way. For example, let’s say we have a method printing the Maximum value for a set of ints, or “Not available” if it’s empty. I’m pretty sure nobody would ever implement that as follows:

Instead, we would probably design something to handle the special case of an empty set result explicitly, be it by an IntResult, use a TryGetMax method or making the returned int nullable. The latter might seem contradictory, but the point is that our code makes explicit that we will not always be able to get a maximum value from an int set. In this case we have actually thought about representing the real range of results, whereas by just defaulting to return a null whenever an edge case occurs, we haven’t. The latter situation is accidental design since we did actually make a design decision, but not consciously.

Primitive obsession

Primitive obsession is the tendency to express most of your code in terms of language primitives such as int, float, double, decimal, string, DateTime, Timespan, etc. There are 3 common sources of problems with this approach, nicely illustrated by the following code from the payroll case study:

How often have you introduced bugs because you mixed up the order of the parameters, and accidentally mixed up the salary and the commission rate? I can say I have. Also, what is the commission rate expressed in, is it a percentage or fraction? What if we want to change the representation of the percentage commission rate to an int? We’ll have to track and change it everywhere.

The underlying problem is that we’re working at the wrong level of abstraction for this problem. ints, decimals, strings, etc are at the level of abstraction of the language/platform, but we’re working at the level of the business here. Things would be much better if we’d change it to something like this:

This way, type safety would prevent us from mixing up the parameters, we can create nice explicit factory methods on our Salary and CommissionRate classes, and changes in data representation would be local to those objects only. Besides that, aren’t we told in high school that an number without a unit is meaningless?

Again, this is accidental design at work: our environment provides us with these primitive types, so we tend to use them. These primitive types, however, are not designed by you. Had they been different, your design would’ve been much different. How much of that would you be comfortable with?

One more example of this is the lack of a Date type in .NET. Because of this we tend to represent all dates as DateTime’s. This has caused me headaches in more than one project, and could’ve been mitigated had we created a Date type ourselves (it’s even worse when you deal with Unix timestamps).

Conclusion

Accidental design might sound like a bad thing in general, but to be honest I actually think a lot of design “decisions” are made that way. If it wasn’t, we’d probably never ship. Still, I think it’s good practice to be aware of as many of them as you can, and know your options when you actually do need to make something more explicit. Other areas where I see a lot of accidental design (or architecture) are choices for languages, frameworks and database technology. I might write about those in a later post.

With regard to the examples: In the case of GetEmployee method, it might actually be a pretty good convention to return null when the specified Employee couldn’t be found. But by being a convention, it’s actually become a conscious decision. Also, an argument I’ve heard against wrapping every primitive is that it introduces needless complexity and a reduced overview of your project due to the proliferation of classes. Those are sound arguments, but in my experience they don’t weigh up against the benefits.

Uncategorized

Identity based throttling in ASP.NET MVC

October 25, 2014 0 Comment Freek Paans

For one of our projects, we recently got sort of DoS’d by one of our client’s own (paying) customer. Somehow the rate of requests coming from this particular user increased about 500x, bringing down part of our system. We’re not sure what exactly happened, it might have been a bug at our side or due to the hardware/software configuration at the specific customer. Anyway, we needed to do something to prevent this problem from happening in the future, whatever the cause. So we came up with the idea of introducing throttling, and specifically throttling based on the ASP.NET username (identity).

The basic idea is that if you, as a specific user, hit our service more than a predetermined number of times within a specific period, you’ll be shown a message that you’re being throttled.

In this post I’ll describe how we implemented this. We also open sourced the solution, which you can get on nuget or the code via GitHub.

Detecting the overload

The first thing we need to do is keep track of the number of incoming request per user. As we’re using ASP.NET MVC, we can easily do this by creating an ActionFilter. At this moment we don’t want to throttle on non-controller/actions, such as static resources. If we did, we could’ve used HTTP Modules.

The ActionFilter basically keeps a the count per user in a ConcurrentDictionary, and increments the counter whenever an authenticated user hits it.

public class UserThrottlingActionFilterAttribute : ActionFilterAttribute{
    ConcurrentDictionary<string,ConcurrentLong> _throttlePerUser = new ConcurrentDictionary<string,ConcurrentLong>();

    public int RequestsPerTimeStep{get;set;}

    public override void OnActionExecuting(ActionExecutingContext filterContext) {
        if(!filterContext.HttpContext.Request.IsAuthenticated) {
            return;
        }
                
        var username = filterContext.HttpContext.User.Identity.Name;

        var counter = _throttlePerUser.GetOrAdd(username,ConcurrentLong.Zero);

        var lastValue = counter.Increment();

        if(lastValue <= RequestsPerTimeStep) {
            return;
        }

        StartThrottling();
    }
}

ConcurrentLong is just a wrapper around a long, which we can increment in a threadsafe way and have a reference to:

class ConcurrentLong {
    long _counter;

    public long Increment() {
        return Interlocked.Increment(ref _counter);
    }

    internal static ConcurrentLong Zero {
        get {
            return new ConcurrentLong();
        }
    }
}

To reset the counts, we create and assign a new ConcurrentDictionary every time a request is made and it’s been longer than TimeStep since the last flush. By doing it as part of the request, we don’t need a separate thread for flushing the data. We place a lock around this part of the code, though I’m not sure its absolutely necessary, but I didn’t feel like hurting my brain over it too much:

public TimeSpan TimeStep{get;set;}
readonly object _lockObject = new object();
DateTime _lastFlush = DateTime.Now;

public override void OnActionExecuting(ActionExecutingContext filterContext) {
    if(!filterContext.HttpContext.Request.IsAuthenticated) {
        return;
    }
                
    FlushThrottleCounterIfNecessary();

    ...
}

private void FlushThrottleCounterIfNecessary() {
    lock(_lockObject) {
        if((DateTime.Now - _lastFlush) < TimeStep) {
            return;
        }

        _throttlePerUser = new ConcurrentDictionary<string,ConcurrentLong>();
        _lastFlush = DateTime.Now;
    }
}

Now that we’re able to detect that a user is overloading the system, the next step is to actually throttle the user.

Throttling the user

To throttle the user we basically used a variation on an answer to a question about throttling in ASP.NET on SO. This solution leverages the ASP.NET caching feature to track throttled users for ThrottleBackoff time:

public TimeSpan ThrottleBackoff{get;set;}

void StartThrottling(string username) {
    HttpRuntime.Cache.Add(
        username, 
        true,
        null,
        DateTime.Now.Add(ThrottleBackoff),
        Cache.NoSlidingExpiration,
        CacheItemPriority.Low,
        null
    );
}

To actually let the user know that it’s being throttled and not continue on to the action, we need to set the ActionResult from within OnActionExecuting:

public override void OnActionExecuting(ActionExecutingContext filterContext) {
    if(!filterContext.HttpContext.Request.IsAuthenticated) {
        return;
    }
            
    var username = filterContext.HttpContext.User.Identity.Name;

    if(HttpRuntime.Cache[username]!=null) {
        var result = new ViewResult {
            ViewName = "Throttling",
            ViewData = new ViewDataDictionary(),
        };

        filterContext.Result = result;
        filterContext.HttpContext.Response.StatusCode = 429; // too many requests
        return;
    }

    ...
}

Here we return a special view “Throttling” which contains the error message. Also we reply with HTTP Status Code 429, Too Many Requests.

Conclusion

That’s pretty much it. If you’re looking for a way to throttle your users in an ASP.NET MVC app, don’t look any further. I put the source on GitHub, it comes with a sample MVC app and a jmeter test script for experimentation. You can also install it via nuget: Install-Package asp.net-user-throttling.

Some final notes:

It should be clear that this is not a complete (D)DoS protection strategy, it’s actually much more a way to protect yourself from someone accidentally overloading your system.
Since we’re using action filters, this will only work for URLs that actually map to a controller/action pair.
In a multi-node situation, there will be a cache and counter per node, so a user might easily do more requests than the amounts you specify. For us, this wasn’t a particular problem because it’s about orders of magnitude. More serious though, is that you might be throttled on 1 node and not on the others. This would give an unstable and annoying experience to the users. We solved this by actually setting a cookie that throttling is going on and when that’s present we also reply with the you’re being throttled page.
In our solution, we allow the user to override its throttling, also via cookie. This is again because we’re protecting against accidental errors. So if a user is being throttled while doing legitimate work, he can actually ignore the throttling.

Uncategorized

Infi Coding Dojo – Object Calisthenics

July 11, 2014 0 Comment Freek Paans

The dojo

A couple of weeks ago we held the first ever Infi coding dojo. The basic idea of a coding dojo is to get together with a bunch of coders and practice some new coding skills that you’re eager to learn, but don’t normally get around doing. Besides this being a cool goal in itself, we also thought it would just be plain fun. So we decided to organize one at our company, and after surveying some colleagues the dice was thrown to let it be about object calisthenics. We ended up with a bunch of colleagues and friends and had a great time. In this blog post I’ll dive into some of the problems we faced, and how we resolved them (or not).

The challenge was to build a simple tennis-scoring program while applying the object calisthenics rules. We picked tennis-scoring because we thought it was a fairly well understood domain (which was proved wrong pretty quickly..) and it was probably small enough to fit into a 2-3 hour session while still posing the challenges you’ll face when applying object calisthenics. I put up a description of the challenge on github, so you can try doing it yourself.

Following the rules

Object calisthenics is about 9 or 10 rules (depending on the article you read), which are supposed to make your code more object oriented and thereby ‘better’. I first learned about it by Fred George at BuildStuff 2013 and really got me thinking about what OO is all about. In this blog, I’ll talk about our experiences with applying the rules, but if you’d like to read more theory, read the original article by Jeff Bay.

Rule 1. One level of indentation per method

This one was fairly easy to attain. Just whenever you were making a mess, your usual strategy would be to factor out the violating parts into a new method. One interesting side-effect of this was that this sometimes also removes the need of comment in front of the nesting; the name of the new method would simply take over this function, which I like.

One of the more interesting cases was whether a switch statements constitutes multiple levels of indentation. For example, one of the solutions had this code:

public static string Print(GameScore gameScore)
{
    switch (gameScore)
    {
        case GameScore.Zero:
            return "0";
        case GameScore.Fifteen:
            return "15";
        case GameScore.Thirty:
            return "30";
        case GameScore.Fourty:
            return "40";
        case GameScore.Advantage:
            return "A";
    }
    return null;
}

Now, for some reason some people felt the return statements weren’t really violating this rule, and then argued that you could remove the violation by decreasing the level of indentation of either the case or the return statement. But I guess you could do that to remove any indentation, so I, as a good sensei, ruled it a violation (this is also because Fred George forbade them in his talk, as he did with if statements..). But it raises an interesting point about the rules: sometimes it isn’t entirely clear what their goals are, making good arbitration hard. Switches then would be eliminated using either Dictionaries/Hashmaps or ifs with early returns:

public static string Print(GameScore gameScore)
{
    if(gameScore==GameScore.Zero) {
        return "0";
    }

    if(gameScore==GameScore.Fifteen) {
        return "15";
    }

    ...

    return "A";
}

static GameScoreMap = new Dictionary<GameScore,String> {{GameScore.Zero, "0"}, {GameScore.Fifteen, "15"} ...} //note this does not actually work, should be done from static ctor

public static string Print(GameScore gameScore)
{
    return GameScoreMap[gameScore];
}

Rule 2. Don’t use the else keyword

This one also wasn’t that hard to figure out, it was mostly resolved using the same strategies as above, that is: early returns or using a dictionary. One of the solutions applied the state pattern (which interestingly enough, introduced a violation of rule 9: don’t use setters).

Again, the question was raised what the actual goal of the rule was. Most people felt the early return strategy was kind of a hack since there is still an implicit else. On the other hand, early returns in itself have the interesting property that your mental image of the function only needs to consider 2 flows at max, which I personally find very valuable. Therefor, I think early returns are a valid strategy to apply this rule.

Rule 3. Wrap all primitives and strings

This was one of the more interesting rules. While it’s actually pretty easy to wrap all primitives in their own type, it’s kind of hard to do this AND comply to rule 9 (don’t use getters and setters). I’ll illustrate the problem based a simple clock which keeps minutes and seconds (this kind of resembles the problem found in the tennis game, but keeps the terminology simpler):

    class Clock {
        private Minutes _minutes = new Minutes(0);
        private Seconds _seconds = new Seconds(0);

        public void Tick() {...}
    }

    struct Minutes {
        int _minutes;
        public Minutes(int minutes) {
            _minutes = minutes;
        }
    }

    struct Seconds {
        int _seconds;
        public Seconds(int seconds) {
            _seconds = seconds;
        }
    }

An external source calls the tick method on the clock every second. Now, how do we implement tick? A straightforward solution would be

public void Tick() { 
    _seconds.Increment();

    if(!_seconds.Equals(new Seconds(60))) {
        return;
    }

    _seconds = new Seconds(0);
    _minutes.Increment();
}

Where Seconds.Increment and Minutes.Increment are implemented by incrementing their integer field. The problem in this piece of code is the

    if(!_seconds.Equals(new Seconds(60))) {

line, because this actually constitutes checking the state of _seconds and making a decision based on it, which is exactly what rule 9 is supposed to prevent. So, even though we’re not really using a getter here, I think it’s still sort of a violation of rule 9.

A solution in accordance with the rules would be something like this

class Clock {
    ...
    public void Tick() { 
        _seconds.Increment(_minutes);
    }
}

struct Seconds {
    ...

    public void Increment(Minutes minutes) {
        _seconds++;
        if(_seconds != 60) {
            return;
        }
        _seconds = 0;
        minutes.Increment();
    }
}

Which most of us didn’t like that much either, because we felt Seconds shouldn’t really know about Minutes. Also, this gets worse if we add hours, because then we need to add Hours to the Seconds.Increment signature as well:

class Clock {
    private Hours _hours = new Hours(0);
    private Minutes _minutes = new Minutes(0);
    private Seconds _seconds = new Seconds(0);

    public Clock() { }
    public void Tick() { 
        _seconds.Increment(_minutes, _hours);
    }
}

struct Hours {
    ...
}

struct Minutes {
    ...
    internal void Increment(Hours hours) {
        _minutes ++;
        if(_minutes != 60 ){
            return;
        }

        _minutes = 0;
        hours.Increment();
    }
}

struct Seconds {
    ...

    public void Increment(Minutes minutes, Hours hours) {
        _seconds++;
        if(_seconds != 60) {
            return;
        }
        _seconds = 0;
        minutes.Increment(hours);
    }
}

So now Seconds needs to also know about Hours, which most of us feel just isn’t right (but might be caused by our collective procedural mindset). This solution, by the way, is in violation of rule 8 (no more than two instance variables). Working around that requires you to setup an even more elaborate callback structure, which I think definitely complicates matters worse.

So, this one is really undecided. I think I’m fine with comparing primitive’s values, but I think it’s a slippery slope. I’d love to hear some discussion or examples in the comments.

BTW. In a tennis game this occurs when one of the players completes a game or a set.

Rule 4. First class collections

I don’t think this rule actually applied in any of the final solutions, but there were some WIP solutions that violated this rule and corrected it later. One of the cases where there was a TennisMatch score which had 2 dictionaries, one for the set score, and one for the game score. These where refactored in their own classes SetScore and GameScore. The corresponding methods for incrementing the applicable score for a specific player was then deferred to those objects, having GameScore call back into the TennisMatch when the game was complete. For an example, see: https://github.com/edeckers/ObjectCalisthenicsCodingDojo/blob/master/TennisMatchCodingDojo/SpelerScore.cs

Rule 5. One dot per line

I think this one pretty much followed from other rules, especially rule 9 (no getters). We found that most methods had a void return type and since there were no getters, this one was actually pretty hard to break.

Rule 6. Don’t abbreviate

We didn’t really encounter any problems with this one, except maybe for the occasional for loop iterator. But I don’t think any of those made a final solution.

Rule 7. Keep all entities small

Due to the scope of the challenge this wasn’t really a problem. Also, I suppose this one is going to be hard to break due to the other rules. If anyone has any experience with this, please let me know.

Rule 8. No classes with more than two instance variables

Again one of the more interesting rules. When applying this rule, one of the design issues that comes among what dimension you need to segregate your classes. In the case of our tennis match you can imagine having the follow class:

class TennisMatch {
    GameScore _player1GameScore;
    SetScore _player1SetScore;
    GameScore _player2GameScore;
    SetScore _player2SetScore;
}

Which violates this rules, so we need to split out some classes. Here we saw two directions, segregate by player:

class TennisMatch {
    PlayerScore _player1Score;
    PlayerScore _player2Score;
}

class PlayerScore {
    GameScore _gameScore;
    SetScore _setScore;
}

or by score:

class TennisMatch {
    ScoreInGame _scoreInGame;
    ScoreInSet _scoreInSet;
}

class ScoreInGame {
    GameScore _player1Score;
    GameScore _player2Score;
}

class ScoreInSet {
    SetScore _player1Score;
    SetScore _player2Score;
}

It turns out that segregating by score worked best since the game’s rules involve comparing scores of similar kind. For example, when deciding if a game is complete when you are on 40 in a game you need to check if the opponent’s score is not 40 or on advantage. Checking your opponent’s score is hard if it’s a couple of objects away and using object calisthenics.

I particularly liked this rule because it forces you to actually put data where it belongs. In the above situation having it on the player makes no sense except that it might feel like it’s owned by the player and should therefore be on it in. In the actual solution it of course still belongs to the player, but it’s coupled to it via identity, not reference, leading to way less coupling.

Rule 9. No getters/setters/properties

This is probably the most important rule, as well as the hardest. It forces you to really put logic where it belongs. I personally interpret this rule as: don’t make decisions on somebody else’s state, but as shown in rule 5, that becomes really hard when you involve primitives and interpret comparing (the entire object) as checking state.

Another issue that came up was whether returning a value from a (command) method counts as a getter. For example, a boolean was returned to indicate if a game was complete. Then, the calling method would increase the games for the player if that returned true. I’m a big favorite of command-query seperation myself, so usually don’t code that way. But I think this should probably count as a getter. I wonder what you guys think?

In fact, when you strictly apply Tell, don’t ask, you’ll find that there is no need for any (public) query method. You’ll always invoke operations on other objects, asking them to do stuff, and potentially call back into you. We found that this introduces some circular dependencies between classes. While it’s a common conception that this is a bad thing (a smell), there’s not really a way around this. So, also not really sure about this one.

Another issue is how to do presentation. In fact, if you look at the our example project, you’ll find there is a method returning a string representation of the score. We use this both for testing and display purposes. I think this is probably alright, but it might be seen as a violation of TDA. You could quite easily fix this changing the method to take a TextWriter and have the method write to that, though. A more interesting question that comes up is how to change formatting: what if we want to change the order in which the score are written, or want to display only the current game score. Now, this is actually a very interesting question, because if you think about it a bit more, you’ll come to the conclusion that UI and business logic will always be tightly coupled. This also implies that the business logic will need to now a thing or two about the UI. Now, this really challenges the common belief that business and UI code should and can be strictly separated. I might write more about this later, but for now I’ll refer you to an article by Allen Holub, which also addresses this issue.

Again, it would be nice if the goal of this rule would be stated somewhat more explicitly, so we can actually derive answers to these questions ourselves.

Conclusion

All in all, we had great fun doing this coding dojo. It’s always good to try new things and have discussions with other developers. With regard to object calisthenics, I think it actually pushes you to better OO design. It forces you to think about encapsulation in ways you probably didn’t before, and that will probably lead to better code. However, I strongly felt the rules as they are formulated right now are open for too much interpretation, which can also guide/force you in the wrong direction. I think it would be really helpful if there was a follow up on the original article. I must say, however, that it could also be us just interpreting the rules in the wrong way.

As said, you can find a lot of code (both the problem and the solutions) on github. Special thanks to all participants and I really hope we can have some cool discussions in the comments. I am particularly interested how other people are solving the issues demonstrated in rule 3 and 9.