CouchDB compaction - big impacts
Tuesday, December 8, 2009 at 12:07PM CouchDB needs to have it’s databases compacted regularly. It’s quite easy to do but the ease of doing so may lead you into thinking that it’s not worthy of serious consideration. You need to be aware of a few things.
Here at the beeb we have many databases of very differing sizes, with very different “busy times” and to be honest we don’t really know nor care what data is in them. Some documents are very simple, some we know are serialised PHP objects, some are JSON base64 encodings of images (we do not allow attachments) - some docs are less than 50 bytes, some are pushing 1 meg. Some databases have their busy times at 2am - others with the broadcast on the weekends of television shows.
We had a bug in some code a while ago (not CouchDB code) that meant we wanted to not compact our databases for a while. Ordinarily we compact daily - but for one namespace (renamed as “brain test” below) we wanted to keep all previous versions of data for a while (and still do so I did not compact it yesterday). As it was a “hmm - we need to do a one-off here” I compacted everything (nearly together) instead of our system code’s more gentle “one at a time” approach.
Here are some charts so see what happened.
This chart is going to take some explaining:
- Think of it like a stock chart: the blue is the “volume” - the overall size on disk of the database. Its axis is on the left drawn on a log scale given the range of sizes. The biggest was ~ 150 GB, the smallest 80k - but it’s not really that important here.
- The top of the red bar is the relative size of the database before the compaction started - that’s why they are all “one” on the right-hand axis
- The bottom of the red bar is the relative size of the database after the compaction finished. Some of hte databases did not compact much, some compacted down to about 2% of their original size
- The thin black lines going up are the amount of space that the compaction took during the process. For namespaces that compact really well, that line is short and the “high tide mark” of the amount of space used is not much above the opening size. However - for namespaces that do not compact well - this high tide mark is as big as the original database. More on this watch-point later.
So - what can we see here?
- The biggest - “big page” - saved 32% of its space - it ended up at about 100 GB in the end.
- “a cache” which was the second biggest compacted REALLY well - due essentially to the very high number of revisions that some of its keys had. It started at 129 gig - finished at 7 gig. That was really quite a nice saving.
- I didn’t compact “brain test” so it finished up the same size - but look at it’s “high” bar. If I had compacted it - it would have take up quite a bit of space. The high bar ought not be there as it didn’t undergo the taking-up-more-space-temporarily compaction process - it’s a flaw in my spreadsheet.
- As you look at the red bars, you can see that most databases compact well - but small ones look like they don’t. This is because small ones are not being revised much - there’s very little going on in them so there’s little changing and hence little for the compactor to free up.
So that’s the good news. Now the bad.
Here’s what the process did to one of our 16 servers handling the CouchDB data:

For two hours on these 8 core (2 x quad-core DL-380s) machines were taking a thumping. The ops crew came wandering over wondering what’s going on. They know that big HTTP traffic can cause load on these boxes but high load without traffic? That’s odd.
I don’t have the charts that show the slower response times that this causes on the users of the service but they’ll look a bit like the load graph.
What can we take from this?
- Compact regularily - but only if the amount of free space on your drives is greater than the size of your biggest database. This is a hard limit that your “capacity planning” must take into account
- It can save you a lot of space
- It will heavily load your servers - perhaps for quite a while
- Tell your ops crew that this will happen and that they can expect this sort of load

Reader Comments (9)
There is room for an optimization here. Currently compaction blows the FS cache. If we set a flag on the .compact file that says "don't cache my writes" it should have a much smaller effect on perceived client performances, as the main db file won't get shoved out of RAM.
That would be great Chris :-)
My testing points to degrading compaction performance as the update sequence grows. We are just getting ready to run couch in full blown production, but in tests we discovered that (on a macbook pro, wall-clock):
1) Compaction with 100 documents takes a sec
2) Compaction with ~300k documents takes 30 seconds and butt loads of disk io, even if the db has not been modified
3) Deleted all documents, now an empty database, compaction still takes 30 seconds and butt loads of disk io
3a) Have run compaction several times on the db with stats: Documents:0, Size 274.1mb, Update Seq: 679976 @ 30 seconds
I love couch, I think compaction needs work, I haven't reported any of my findings to the lists yet because tests are tests and we'll see what happens in the real world. I'm sharing here as it seems your scenario can produce some good data on how to improve compaction (beyond the FS cache, there is also room to reclaim space).
@Troy, if for #3 you mean you took a database with 300k docs and deleted them all, it would make sense that the compaction would take some time. Given CouchDB's MVCC, append-only storage, you actually wrote a new rev to the disk for each doc to indicate that deletion. So, "delete" doesn't actually remove data, until compaction... :^D
@Troy - I can confirm ZachZ's point - and follow-up with a related point: even when a doc is deleted a record of it is kept under certain circumstances - most notably when the database is being replicated.
It's also worth trying these out on a staging or development server too as - great as they are - MacBook Pros do not behave the same as live servers under load - and here at the beeb we've completely given up on trying extrapolate from a laptop to production servers. Fingers crossed that your circumstances allow for such kit!
For clarification, the database info is:
{"db_name":"tstconf","doc_count":0,"doc_del_count":325000,"update_seq":679976,"purge_seq":0,"compact_running":false,"disk_size":287428710,"instance_start_time":"1260317224801439","disk_format_version":4}
http://localhost:5984/tstconf/_all_docs takes 3 seconds and spikes cpu, returns:
{"total_rows":0,"offset":0,"rows":[]}
compaction takes over a minute no mater how many times run.
Is it my macbook pro, or that I'm using .11 -- I don't know & don't care. Couch is going to allow us to do some really cool stuff in production, and if the same problems pop up, worst case scenario I'll spend a month of my life becoming an erlang hacker to help fix it ;)
@Troy if your goal is to make those documents "really" disappear you can purge them instead of deleting them. Purging removes all traces of a document from a database. This can be problematic if you want to replicate the DB with another server, but it is useful for some situations. The syntax is
POST /db/_purge -d '{"id1":["rev1"], "id2":["rev2"]}'
All that aside, CouchDB compaction could surely be more efficient. We'd love the help!
Hey Enda, what are you doing thrashing my boxes? :)
Pramod - waking them up!