The Logical vs the Physical: Layered Architecture

A lot of mistakes and confusion in software development arise from conflating logical and physical concerns. For example, an object in a conceptual (logical) object model would usually be implemented by multiple objects in the implementation (physical) model. Not knowing this distinction might entice you to include all the implementation classes into the conceptual model, or not factor your implementation classes neatly. One layer down, this mistake is made again when mapping your objects to a data model.

This kind of mistake is so common, that I decided to try to write a little bit about it whenever I see this happening in practice. In this first post about it, I’ll talk about it in the context of Layered Architecture.

Layered architecture

I don’t think the layered architecture style (or its cousin, hexagonal architecture) needs much of an introduction:
layered
The UI layer is responsible for visualizing stuff, the application layer for managing the application state, the domain layer for enforcing business rules and the data layer manages the data. Every layer builds on top of the layers below it. It can know about those lower layers, but not about the layers above it. Depending on your point of view the data layer is sometimes thought of as an infrastructure layer which runs besides all the other layers. For the purpose of this discussion, that doesn’t matter much.

Logical vs Physical

So is the separation of layers a logical or a physical separation? This question raises another question: what it means to belong in a certain layer. Am I in the application layer because my code is in a namespace called MyAwesomeApp.AppLayer, or am I in the application layer because I behave according to its rules: I don’t know about the UI layer, and I don’t enforce business rules or do data management.

When it’s stated like that, you’re probably going to agree that it should be the latter. It doesn’t matter that much where you’re physically located, but only what you logically do. Yet, this is completely contrary to what I encounter in a lot of code bases.

A common sign is having your projects (or JARS, or packages, or namespaces, or whatever) named after their layers: MyAwesomeApp.UI, MyAwesomeApp.Application, MyAwesomeApp.Domain, MyAwesomeApp.DAL, etc. The UI package would contain your HTML, JavaScript and web endpoint code. The application layer would contain a couple of application services. The Domain layer would host our business rule code, presumably through a domain model. Finally, the DAL would do all the interactions with a given database.

So when is this a problem? Well, it doesn’t really have to be a problem, as long as you don’t need to do any, say, business logic in your UI or DAL package. But how often is this the case? In almost every project I work in there’s at least a bit of client-side (JavaScript) validation going on, which is definitely a business rule. Yet, this code lives in the UI layer. Or, I might have a unique username constraint in my database (which is part of the DAL), which is also a business rule. I’m pretty sure this happens in any project of practical size.

Now, this isn’t really a big problem in itself either. There’s the problem of the naming being off since there’s business logic in the UI and DAL package, which you would expect in the domain layer and this probably causes some confusion to new-comers. Secondly, we might ‘forget’ applying a layered design inside the individual layers since it looks like we’ve given up on this architecture for this feature anyway (since we’re not putting it in the appropriate package). That causes that part of the code to be less well-designed.

A real problem occurs, however, when we dogmatically try to put the business rules in the domain layer anyway, an idea often strengthened by the (dogmatic) need to keep things DRY. Ways I’ve seen this gone and done wrong are at least the following:

  • Putting JavaScript code in our domain package, and somehow merging that to the rest of JS code at build time. The causes major infra-headaches.
  • Enforcing unique constraints and the like in memory. This doesn’t perform.
  • Refrain from using purely client-side validations, and always go through the backend via ajax. Users won’t like the latency.
  • Doing a lot of pub/sub with very generic types to be able to talk to another layer you shouldn’t know about anyway. In this code base, I have no idea what’s going on at runtime anymore.
  • Endless discussions among developers to agree on where a given piece of code should go. This is just a waste of time.

And these are probably just the tip of the iceberg.

We can prevent this problem by allowing our packages to not map 1-to-1 to the physical layers. In this case it’s an easy discussion where code should go. Now, we need to apply a little more discipline within this package to still apply a layered architecture, but at least it will be an honest story. Another method is to package based on the role a given set of code has: one package has to do with client facing concerns, another with database concerns and yet another with a domain model. You might think these correspond very closely to UI/APP/Domain/Data layer, and you’d be right, but the intention is different. The fact that one package handles database concerns also means it can be responsible for handle business rules that can most efficiently be implemented in a database.

This way, your physical package structure doesn’t have to reflect the logical layers of your layered architecture. It’s OK to have some business rules in your JavaScript or database, and they still belong to the domain layer when you do. However, make sure you still maintain the constraints of the layered architecture within that physical package as well.

Conclusion

This is just one example where conflating logical and physical concerns can cause big problems in your code base. By untangling both, we can feel less dirty about ourselves when we put a certain piece of code in the ‘wrong’ layer and at the same time build more effective software.

Moving away from legacy, DDD style

One of the things I like about DDD is that it has solutions for a wide variety of problems. One of those is how to handle legacy software, and specifically how to move away from those. There’s a passage in the book about this subject, as well as some additional material online.

This year I’m planning to apply some of these principles and techniques to a project we’ve developed and use internally at Infi: our time tracker. This tool has been under development for 8+ years now, and throughout the years it’s become ever harder to add new functionality. There are various reasons for this, such as outdated technology, a missing vision on the design and the software trying to solve many separate problems with just one model. So there’s been pressure to replace this system for a while now, and doing so via DDD-practices seems both natural and fun.

This is going to be more of a journey than a project, so I’ll try to keep you updated during the year.

DDD style legacy replacement

The DDD style approach to moving away from legacy is to first and foremost focus on the core domain. We shouldn’t try to redesign the whole system at once, or try to refactor ourselves out of the mess, since that hardly ever works. Besides, there is probably a lot of value hidden in the current non-core legacy systems, and it doesn’t make sense to rewrite that since it’s been working more or less fine for years and we don’t actually need new features in these areas.

Instead, we should identify the actual reasons why we want to move away from the legacy system, and what value it’s going to bring us when doing so. More often than not, the reason will be deeply rooted in the core domain: maybe we’re having problems delivering new features due to an inappropriate model, maybe the code is just really bad, etc. Whatever the reason, the current system is holding back development in the core domain, and that’s hurting the business.

So how do we approach this? The aforementioned resources provide a couple of strategies, and they all revolve around a basic idea: create a nice clean, isolated environment for developing a new bounded context. This new bounded context won’t be encumbered by existing software or models, and we can develop a new model that addresses the problems we’d like solve in our core domain.

The goal

So what are our reasons for wanting replace our current application? Well, you can probably imagine that time tracking is very important to us since this is what we use to bill our clients. Also, we use it internally to measure all sorts of stuff and make management decisions based on that data. These issues make time tracking a key process in our organization. To be fair, it’s not mission critical, but still important enough to consider Core Domain.

For our goals, time tracking is only effective if it’s both accurate and entered timely. I think the number one way to stimulate this is making tracking your time as easy and convenient as possible. I think we can improve on this by creating a model of the way that people actually spend in our company. By having deep knowledge about the way the time is spent, I envision the model being able to, for example, provide context-sensitive suggestions or notify people at sensible times that time-entry is due. Having these kind of features would make tracking your time a little less of a burden.

The plan

Let’s look at our current context map:

initial-context-map

There are currently 4 Bounded Contexts

  • Time database. This is the current application. I’ve declared it a big ball of mud since I don’t think there’s a consistent model hidden in there, and frankly I don’t care. This application is currently used for entering your times, generating exports, managing users, etc.
  • Client reporting. Client reporting is concerned with regularly giving updates to our clients about how we spend our time. It gets its data from TTA, but uses a different model for the actual reporting step, which is why it’s a separate BC. Most of the work with this model is manual, in Excel.
  • Invoicing. While the TTA currently has functionality for generating invoices, we don’t directly use that for sending invoices to our customers. We use data from the TTA, but then model that differently in this context. Again, this is mostly manual work.
  • Management reporting. This is what we use to make week-to-week operational decisions, and uses yet another model. This is actually an API that directly queries the TTA database.

I’m not planning on replacing the entire existing application for now, just the parts that have to do with time entry. Reporting, for example, is out of scope.

We see all BCs partners because functions are required to successfully run the company. It’s probably possible to unify the “satellite” models, but we don’t care about that now since we want to focus on the actual core domain of actually doing the time tracking.

For the new system, we’re going to try the “Bubble Context with an ACL-backed repository” strategy, and hope we can later evolve it to one of the other strategies. The destination context map will look like this:
new-context-map
The new BC will contain all the new code: an implementation of the model as well as a new GUI. For a lack of a better name I’ve called it Time-entry for now.

A final twist

Just to make things more interesting, I’m planning on doing the new code in Clojure. There are a couple of reasons for this:

  • I just like learning new stuff, and Clojure is new for me.
  • I’ve been encountering Lisps more and more over the last couple of years, and people that I highly respect often speak about Lisps in high regard. So it’s about time I figure out what all the fuss is about.
  • I’d like to try something outside of .NET, for numerous reasons.
  • Lisps are known for their ability to nicely do DSLs, and that seems a good fit for DDD.
  • I want to see how DDD patterns map to a more functional language, and specifically what impact that has on modeling.
  • I wonder how interactive programming (with a REPL) works in real-life

My experience with Clojure thus far has been some toy projects and I read the Joy of Clojure, but that’s about it. So expect me to make a lot of rookie mistakes, and please tell me when I do ­čÖé

Next steps

All the new code will be open source and on Github. I probably won’t be able to open source the code for the original application, but I hope I can publish enough to be able to run the ACL. This should be enough to get the entire application running. I hope to get the first code out in a couple of weeks.

.NET is slowing me down

.NET has been my primary development environment for a little over 5 years now. I’ve always really liked it, had a lot of success with it, and learned a lot while using it. The tooling and maturity of the platform is, and has been, right where it had to be for me. In a lot of projects, it allowed me to really focus on the domain and I seldom have to write custom tooling for doing standard stuff, which I had to on other platforms. This allowed me to deliver a lot of value to my clients, and they’re happy about that.

There is, however, a problem that’s growing bigger and bigger with .NET: it’s getting slow to develop on. This doesn’t mean .NET itself is getting slow, it means the developer experience is getting slower. To illustrate my point, I’ve measured the times to the very first byte (that is, the time after a rebuild) for the template application for various MVC versions:

.NET Version MVC Version First available date Time to very first byte (seconds)
4.0 2 March 2010 1.00
4.0 3 January 2011 1.12
4.5.2 3 May 2014 1.45
4.0 4 August 2012 2.63
4.5.2 4 May 2014 2.89
4.5.2 5.2.3 January 2015 3.47
4.6 5.2.3 July 2015 3.58
4.6 6.0.0-beta5 July 2015 1.89

So, over the course of 5 years the time to load the first page has increased by a factor 3.5, and 2.5 seconds in absolute terms. Now, it seems ASP.NET 5 is going to reduce times a bit, but still not to the 2010 level.

To make matters worse: something like Entity Framework is getting equally slower, and hitting a page that goes to the database might easily take somewhere between 5-10 seconds. The same goes for tests: running the first easily takes a couple of seconds due to EF only.

Environmental Viscosity

So, what’s the problem? Environmental viscosity. To quote Uncle Bob from PPP:

Viscosity of the environment comes about when the development environment is slow and inefficient. For example, if compile times are very long, developers will be tempted to make changes that don’t force large recompiles, even though those changes don’t preserve the design.

This is exactly what’s going on here. Because load times are slow, I tend to:

  • Make bigger changes before reloading
  • Write less tests
  • Write tests that test larger portions of functionality
  • Implement back-end code in the front-end (HTML/JavaScript)
  • Visit reddit while the page loads

All these things are undesirable. They slow me down, and compromise the quality of the software. If you ever worked with “enterprise” CMS software, you’ve seen this happen to the extreme (I sure have): there might be minutes between writing a change and the page actually being loaded.

Even if you don’t do all the above the things, and slavishly wait for the page to load/test to run every time, you’re still wasting your time, which isn’t good. You might not recognize it as being a big deal, but imagine making 500 changes every day, that translates to 500 x 5s = 2500 seconds of waiting. That’s more than 40 minutes of waiting, every day.

Architecture

To reiterate: slow feedback compromises software quality. What I want, therefore, is feedback on my changes under a second, preferably under 500ms. My choice of technology/tools will definitely factor in this requirement, and it will be a strong factor.

For example, my choice for data access defaults to Dapper these days, because it’s just much faster than EF (tbf, I also rely less on “advanced” mappings). Even something like PHP, for all its faults, tends to have a time-to-very-first-byte that’s an order of magnitude faster than .NET apps, and therefore be something I might consider when other .NET qualities aren’t that important.

To me, the development experience is as much part of software architecture as anything else: I consider anything related to building software a part of architecture and since slow feedback compromises software quality, the development experience is certainly part of architecture.

The future of .NET

I certainly hope Microsoft is going to improve on these matters. There is some hope: ASP.NET 5 and Entity Framework 7 are right around the corner, and they promise to be lighter-weight, which I hope translates in faster start-up times. Also, Visual Studio 2015 seems to be a bit faster than 2013 (which was and is terrible), but not as fast a VS2012. I guess we’ll have to wait and see. For the time being, though, I’ll keep weighing my options.

Start closing the end user feedback loop!

The most important feedback loop in any software development project is the feedback you get from end users. The reason for this is simple: since they’re the ones actually using the product, it’s them paying for it, either directly or indirectly. If your product isn’t being used, I can guarantee you development on it is going to end sooner than later.

Unfortunately, it seems this isn’t common knowledge. Instead, we tend to focus on the feedback of the client and (implicitly) assume that when the client is happy, the end user will be happy. Also, since it’s the client that pays us, it only seems reasonable that it’s only their feedback that counts. Well, it turns out clients in general aren’t that much better in figuring out what their users want. Instead, they rely on feedback from those actual users to decide what the next feature is going to be, or what needs to be improved.

Having efficient ways to gather such feedback is therefore of extreme importance, yet often overlooked. In a lot of cases, feedback is only being gathered by physically talking to the users. While this results in high quality feedback, it’s not very efficient and the probability of missing things is very high.

Luckily, we, as developers, can help: there are a lot of technological ways to gather feedback more efficiently, and it’s our responsibility to make those methods available to our clients. Below are 5 techniques you can use to start shortening the feedback loop today.

Five things you can start doing today

Analyse web server log files

The web server logs contain a wealth of information. For example, it can help you:

  • Find out which features are used a most-often by looking at the request path
  • Look for bad user experiences by seeing which requests have high response times or error responses
  • See when your users use the product most. Does that map to what you expect?
  • Figure out which users are heavy users
  • Track individual users as they browse through your site/app

It’s easy to analyze log files with some custom code, or you can use something like Log Parser.

Setup Google Analytics events

If you’re using GA you can use events to track user actions. Use this to see what buttons they’re clicking, whether they’re scrolling, etc. Use this to figure out if users are actually interacting with your site/app as expected.

Install chat software

In-page chat widgets are popping up everywhere. You can install one to provide an easily accessible way for users to contact you. Make sure someone is actually answering the chat, though, or you might leave a bad impression.

Investigate abandoned funnels

Funnels can be abandoned for many reasons: it might indicate a use case you didn’t expect, maybe there was a technical problem or the user changed their mind. Either way, it’s interesting for you to know why. Use any method you have available to figure out why they happen: correlate logs, events, chats, etc. If you have an e-mail address, send them an e-mail to ask why.

A/B testing

A/B testing can help you figure out what your users care and don’t care about. Both are equally important: if they care about something, do it more. If they don’t care about something: don’t try it again and focus on things that do work. You can write the infrastructure yourself, but there are off-the-shelf solutions available as well.

Truly care

Just setting things up is not enough; you should also deeply care about the results. If anything looks inconsistent, you should investigate it. If data isn’t being collected, you should find out why. If some hypothesis is not coming true, you should think of ways to figure out why that is. Do whatever you can to learn more about your users.

Every person on a development team should be aware about who the users are, why they’re using your product, why they keep coming back, etc. It’s not just for one single role within the team (think Product Owner) to care about this stuff; everybody should feel responsible. If everybody in the team cares deeply about the users, the product will become much better, there will be more alignment and your work will be more satisfying.

Vectors as ADT

I talked about autognostic objects a couple of weeks ago, and in that post contrasted them with abstract data types (ADTs). I promised to follow up with a post on an ADT implementation, so here it is.

First of all, let’s state the autognosis property once again: an autognostic object can only have detailed knowledge of itself. This constraint is required for objects, but not for ADTs. On the contrary: ADTs are allowed (maybe even expected) to inspect detailed information from other values of their own type (and only of their type).

From that point of view it means it’s perfectly fine to implement the┬áVector add operation in an ADT as follows:

As you can see, we blatantly access the private data (the x and y) of the addend in order to perform the calculation. We can do this because both the augend and addend are of type Vector and ADTs are allowed to access each others private data when they’re of the same type.

The name Vector denotes a type abstraction. With this kind of abstraction, the abstraction boundary is based on a type name (Vector). This means that as a client all you can see is the type and operations, but the implementation is hidden. “Within” the type, though, you have full access to the implementation and representations. It also means that, contrary to objects, you cannot easily interoperate with other values, since they have a different type and therefore have a hidden representation. All ADTs are based on type abstraction.

This also has some implications for extensibility; specifically that an ADT has to know all possible representations. To see that, let’s say we again want to add a polar representation for the Vector. We do this so we can keep the full accuracy when creating a vector based on polar coordinates, accuracy that would have been lost if we’d convert it to rectangular coordinates first. In JavaScript, we can implement that as follows:

It isn’t pretty, but in languages that have sum types and static typing it tends to work a bit better.

The important thing is that we significantly had to change the ADT to support the new representation. In fact, every new representation will require changes to the ADT. Compare that to objects, where we were able to add new representations without changing any of the existing representations. The reason being that ADTs are abstracted by type, while objects are abstracted by interface.

In general, ADTs are much less suited to adding new representations than objects are. It turns out this difference in extensibility is at the heart of the differences between ADTs and objects, and I’ll dive into that further in a future post. Don’t think that all is bad with ADTs though, they have other qualities… If you’d like a sneak peak, check out the Expression Problem on Wikipedia.