Anatomy of a geth full sync

Last week I blogged about my experiences doing a geth fast sync. The last thing I did back then was start a full sync on the same hardware. Things took a bit longer: whereas the fast sync completed in about 8 hours, the full sync took a little over 9 days. In this post my report.

Specs

I used an Azure Standard_L16s storage optimized VM. This beast has 16 cores, 128 gigs of memory and 80,000 IOPS and 800MBps throughput on its temporary storage disk. Ought to be enough you’d say. I started geth with ./geth --maxpeers 25 --cache 64000 --verbosity 4 --syncmode full >> geth.log 2>&1

Overview

Azure VM Instance Standard_L16s
OS Ubuntu 16.04.4 LTS
CPU 16 cores
Memory 128GB
Disk IOPS (spec) 80,000
Disk throughput (spec) 800 MBps
Geth version geth-linux-amd64-1.8.3-329ac18e
Geth maxpeers 25
Geth cache 64,000MB
Sync mode full

Results

Start time 3 apr 2018 06:26:58 UTC
End time * 12 apr 2018 08:02:37 UTC
Total duration 9d 1h 35m 39s
Imported blocks at catch up time 5,426,156
Total imported state trie entries ? (Don’t know how to check and if that’s even relevant for full sync)
du -s ~/.ethereum 244,752,908 (234G)

* End time defined as first single-block “Imported new chain segment” log message for which al subsequent “Imported new chain segment” log messages have blocks=1

CPU/Load/Memory

(It’s too much of an effort to cut off the chart up until the actual start, so please bear in mind that it starts at 3 apr 6 utc (graph is in UTC+02:00). Sorry, let me know if someone needs more high res charts.)

Disk

Network

Peers

Blocks

Notes

I guess the notes around peers still stand, though I didn’t test that explicitly for full sync:

  • Firewall needs to be open for port 30303 (I opened both UDP and TCP). Otherwise you won’t get enough peers.
  • Syncing actually seems to take more time with more peers. I settled on the default of 25. With 100 peers it was much slower.

Conclusions

Clearly, doing a full sync takes much longer than a fast sync: over 9 days vs about 8 hours. From my data, it looks like CPU is the bottleneck here. What surprises me is that the block rate is very “bursty”. The following patterns repeats itself over the course of the entire sync:

I would expect the block rate to be fairly constant if the CPU is the bottleneck. I don’t think the availability of blocks on the network is the problem here since the fast sync also needs all the blocks, and that happened within 8 hours. I do see some correlation with memory activity, but I didn’t dive in it any more. If someone has any ideas, I’d love to hear!

What also surprises me is that all the Ethereum data is already larger than the entire Bitcoin data directory (about 200GB), while Bitcoin is almost 3 times older than Ethereum. Clearly, Ethereum grows much faster than Bitcoin. I guess that it’ll become even harder to do full syncs in the future, and that will probably mean the number of full nodes will decrease. That can’t be good.

Hope this post was of some help. If you have results to share, please let me know.

Anatomy of a geth –fast sync

I’ve been reading up on Ethereum for the last couple of days. Apparently, doing the initial sync is one of the major issues people run into (at least with geth). That includes me. I first tried syncing on an HDD, and that didn’t work. I then used a mediocre machine with SSD, but it still kept on running with no apparent end in sight. So I decided to use a ridiculously large machine on Azure and sync there. Turns out that with this machine is was able to do a –fast sync in a little under 8 hours.

Specs

I used an Azure Standard_L16s storage optimized VM. This beast has 16 cores, 128 gigs of memory and 80,000 IOPS and 800MBps throughput on its temporary storage disk. Ought to be enough you’d say. I started geth with ./geth --maxpeers 25 --cache 64000 --verbosity 4 >> geth.log 2>&1

Overview

Azure VM Instance Standard_L16s
OS Ubuntu 16.04.4 LTS
CPU 16 cores
Memory 128GB
Disk IOPS (spec) 80,000
Disk throughput (spec) 800 MBps
Geth version geth-linux-amd64-1.8.3-329ac18e
Geth maxpeers 25
Geth cache 64,000MB

Results

Sync phases description.

Start time 2 apr 2018 20:46:43 UTC
End time * 3 apr 2018 04:27:15 UTC
Total duration 7h 40m 32s
Imported blocks at catch up time 5,369,956
Blocks caught up 3 apr 2018 00:11:08 (3h 24m 25s)
Total imported state trie entries 114,566,252
State caught up 3 apr 2018 04:24:07 (7h 37m 24s)
du -s ~/.ethereum 77,948,852

* End time defined as first single-block “Imported new chain segment” log message

CPU/Load/Memory

Disk

Network

Peers

Blocks

State trie

Notes

  • Firewall needs to be open for port 30303 (I opened both UDP and TCP). Otherwise you won’t get enough peers.
  • Syncing actually seems to take more time with more peers. I settled on the default of 25. With 100 peers it was much slower.
  • Importing the chain segments did not take significant time, contrary to the comment mentioned in the github issue.

Conclusions

Disk IO is mostly used while fetching the blocks. After that, the system’s resources are barely used, which makes me think the bottleneck is the network. Though even during block syncing, the resources are barely maxed out, so probably the process is constrained by the network the entire time. I’m not familiar enough with Geth/Ethereum to ascertain this for sure though. As stated above, increasing the number of peers didn’t improve the situation, but made it worse.

Hope this post was of some help. If you have results to share, please let me know.

DDD Layered architecture in Clojure: A first try

The first step in my effort to freshen up our time tracker using DDD & Clojure has been finding a way to structure my code. Since I don’t have that much Clojure experience yet, I decided to take the DDD layered architecture and port it as directly as possible. This probably isn’t really idiomatic Clojure, but it gives me a familiar start. This post should be regarded as such, my first try. If you know better ways, don’t hesitate to let me know.

The architecture

As a picture is worth a thousend words:
layered
This architecture is mostly the same as the one advocated in the DDD Blue Book, except that the Domain Layer does not depend on any data-related infrastructure and there’s a little CQRS mixed in. I think this is mostly standard these days. In this design, the application layer is responsible for transaction management. The ‘Setup’ part of the UI layer means setting up things like dependency injection.

In this post I’ll focus on the interaction between application services, domain objects, repositories and the data layer. I’ll blog about other parts (such as validation) in later posts.

The domain

Unfortunately, I’m not able to release the code for the time tracker just yet (due to some issues with the legacy code). So for this post I’ll use an example domain with curently just one entity… Cargo 🙂 The Cargo currently has one operation: being booked onto a Voyage.

The approach

Let’s start with the Domain Layer. Here, we need to define an “object” and an “interface”: the Cargo and CargoRepository respectively.

Cargo entity

The Cargo entity is implemented as a simple record containing the fields cargo-id, size and voyage-id. I’ve defined a constructor create-new-voyage which does its input validations use pre-conditions.

There’s one domain operation, book-onto-voyage which books the cargo on a voyage. For now, the requirement is that it can’t already be booked on another Voyage. (Remember this post is about overall architecture, not the domain logic itself, which is for a next post).

Furthermore, there is a method for setting the the cargo-id since we rely on the data store to generate it for us, which means we don’t have it yet when creating a new cargo.

Here’s the code:

Cargo Repository

The Cargo Repository consists of 2 parts: the interface which lives in the domain layer, and the implementation which lives in the data layer. The interface is very simple and implemented using a Clojure protocol. It has 3 functions, -find, -add! and -update!.

A note about concurrency: -find returns both the cargo entity and the version as it exists in the database in a map: {:version a-version :cargo the-cargo}. When doing an -update! you need to pass in the version so you can do your optimistic concurrency check. (I’m thinking of returning a vector [version cargo] instead of a map because destructuring the map every time hurts readability in client code, I think.)

Furthermore, I’ve defined convenience methods find, add! and update!, which are globally reachable and rely on a call to set-implementation! when setting up the application. This is to avoid needing to pass (read: dependency inject) the correct repository implementation along the stack. This is probably a bit controversial (global state, pure functions, etc), and I look forward to exploring and hearing about alternatives.

Cargo repository MySQL implementation

I’m using MySQL as the data store, and clojure.java.jdbc for interaction with it. The cargoes are mapped to one table, surprisingly called cargoes. I don’t think there’s anything particular to the implementation, so here it goes:

The final parts are the Application Services and the UI.

The Application Service

I never have good naming conventions (or, almost equivalently, partitioning criteria) for application services. So I’ve just put it in a namespace called application-service, containing functions for all domain operations. The operations can be taken directly from the Cargo entity: creating a new one, and booking it onto a voyage. I use the apply construct to invoke the entity functions to avoid repeating all parameters.

Code:

The UI The tests

To not make this post any longer than it already is I’m not going to show a full UI, but a couple of tests exercising the Application Service instead. This won’t show how to do queries for screens, but for now just assume that I more or less directly query the database for those.

There isn’t much to tell about the tests. If you are not that familiar with Clojure, look for the lines starting with deftest, they define the actual tests. The tests show how to use the application service API to handle commands. They test the end result of the commands by fetching the cargo from the repository and checking its state. I use the MySQL implementation for the database, since I already have it and it performs fine (for now).

Conclusion

The code in this post is pretty much a one-to-one mapping from an OO kind of language to Clojure, which is probably not ideal. Yet, I haven’t been able to find some good resources on how you would structure a business application in a more Clojure idiomatic way, so this will have to do. Nevertheless, I still like the structure I have now. I think it’s pretty clean and I don’t see any big problems (yet). I look forward to exploring more alternatives in the next couple of months, and I’ll keep you updated.

All code (including tests) is available on GitHub.