Last week I blogged about my experiences doing a geth fast sync. The last thing I did back then was start a full sync on the same hardware. Things took a bit longer: whereas the fast sync completed in about 8 hours, the full sync took a little over 9 days. In this post my report.
I used an Azure Standard_L16s storage optimized VM. This beast has 16 cores, 128 gigs of memory and 80,000 IOPS and 800MBps throughput on its temporary storage disk. Ought to be enough you’d say. I started geth with
./geth --maxpeers 25 --cache 64000 --verbosity 4 --syncmode full >> geth.log 2>&1
|Azure VM Instance||Standard_L16s|
|OS||Ubuntu 16.04.4 LTS|
|Disk IOPS (spec)||80,000|
|Disk throughput (spec)||800 MBps|
|Start time||3 apr 2018 06:26:58 UTC|
|End time *||12 apr 2018 08:02:37 UTC|
|Total duration||9d 1h 35m 39s|
|Imported blocks at catch up time||5,426,156|
|Total imported state trie entries||? (Don’t know how to check and if that’s even relevant for full sync)|
|du -s ~/.ethereum||244,752,908 (234G)|
* End time defined as first single-block “Imported new chain segment” log message for which al subsequent “Imported new chain segment” log messages have blocks=1
(It’s too much of an effort to cut off the chart up until the actual start, so please bear in mind that it starts at 3 apr 6 utc (graph is in UTC+02:00). Sorry, let me know if someone needs more high res charts.)
I guess the notes around peers still stand, though I didn’t test that explicitly for full sync:
- Firewall needs to be open for port 30303 (I opened both UDP and TCP). Otherwise you won’t get enough peers.
- Syncing actually seems to take more time with more peers. I settled on the default of 25. With 100 peers it was much slower.
Clearly, doing a full sync takes much longer than a fast sync: over 9 days vs about 8 hours. From my data, it looks like CPU is the bottleneck here. What surprises me is that the block rate is very “bursty”. The following patterns repeats itself over the course of the entire sync:
I would expect the block rate to be fairly constant if the CPU is the bottleneck. I don’t think the availability of blocks on the network is the problem here since the fast sync also needs all the blocks, and that happened within 8 hours. I do see some correlation with memory activity, but I didn’t dive in it any more. If someone has any ideas, I’d love to hear!
What also surprises me is that all the Ethereum data is already larger than the entire Bitcoin data directory (about 200GB), while Bitcoin is almost 3 times older than Ethereum. Clearly, Ethereum grows much faster than Bitcoin. I guess that it’ll become even harder to do full syncs in the future, and that will probably mean the number of full nodes will decrease. That can’t be good.
Hope this post was of some help. If you have results to share, please let me know.