« I'm quoted at a recent Google Tech Talk | Main | iPhone push notifications »
Thursday
Mar042010

CouchDB at scale - 4 billion requests so far

I recently twittered about the number of requests our CouchDB instances have handled since they were last upgraded. As it was a large number it piqued some interest, so I thought I’d give a little background.

We have many nodes, for different reasons:

 

  • as our datasets will not fit onto a single “commodity” server, we shard our data onto a set of 8 nodes
  • for redundancy - to guard against physical hardware failure - we replicate copies to other machines which we can bring into service in seconds
  • for scalability and resilience - we have (currently) two datacentres, with replicated copies.

 

We run everything “master - master” - when nodes (running instances of CouchDB) replicate they do so bi-directonally. We don’t have a “master slave” setup anywhere.

Because we’re all a little busy, and because we haven’t had to, we’re still running a 0.9.0 release of CouchDB, which is now almost a year old. Importantly, that release didn’t have the continuous replication feature and without this we cause quite a large proportion of this traffic ourselves, polling for changes and replicating if we “notice” that something has indeed changed. Work is now underway to catch up with the later releases and we will see fewer requests when we change to use continuous replication.

 

Where did the numbers come from?

These figures are basically the adding together of the “_stats” numbers from within each of our live environments’ CouchDB nodes. Here’s the python code (which won’t work for you - you’ll need to substitute your own list of running nodes, and your own http library) which does this:

#!/usr/bin/env python
import sys
sys.path.append('../')
import couch # from ../

import simplejson

systemRequests = 0
getR = 0
putR = 0
postR = 0
deleteR = 0
headR = 0
for key in couch.nodeToHostPort:
	if key.count(".live.") == 1:
		node = couch.nodeToHostPort[key]
		response = couch.getHttp(node, "_stats")
		strResponse = response.read() 
		json = simplejson.loads(strResponse)
		systemRequests = systemRequests + 
		    int(json["httpd"]["requests"]["current"])
		if "GET" in json["httpd_request_methods"]:
			getR = getR + 
			    int(json["httpd_request_methods"]["GET"]["current"])
		if "PUT" in json["httpd_request_methods"]:
			putR = putR + 
			    int(json["httpd_request_methods"]["PUT"]["current"])
		if "POST" in json["httpd_request_methods"]:
			postR = postR + 
			    int(json["httpd_request_methods"]["POST"]["current"])
		if "DELETE" in json["httpd_request_methods"]:
			deleteR = deleteR + 
			    int(json["httpd_request_methods"]["DELETE"]["current"])
		if "HEAD" in json["httpd_request_methods"]:
			headR = headR + 
			    int(json["httpd_request_methods"]["HEAD"]["current"])
		print ".",
		sys.stdout.flush()
	else:
		pass
	
print("\n  http %15d,\n   get %15d,\n   put %15d,\n  post %15d,\ndelete %15d,\n  head %15d" 
    % ( systemRequests, getR, putR, postR, deleteR, headR))

Nothing too complicated there then eh?

 

Here’s what the current numbers are:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
  http      3901098994,
   get      3793933108,
   put        32751243,
  post        74478011,
delete           26837,
  head             259

We’ll probably get to 4 billion CouchDB requests tomorrow, 5 March 2010. Not bad for 32 CouchDB nodes running since last summer.

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.
  • Response
    I recently twittered about the number of requests our CouchDB instances have handled since they were last upgraded. As it was a large number it piqued some interest, so I thought I’d give a little background.

Reader Comments (10)

EXTREME PARALLELISM!

March 4, 2010 | Unregistered CommenterMUNCTIONAL

I'll take it as a sign of stability, that despite having the largest known CouchDB installation, we rarely hear anything resembling a bug report from you.

Cheers!
Chris

March 4, 2010 | Unregistered CommenterJ Chris Anderson

What is your peak requests per second? 4 billion requests in 7 months is only a 22/s average.

March 4, 2010 | Unregistered CommenterBrandon

Are these requests a mix of reads and writes, or mostly reads? How many records total?
One system I manage (with a dozen Tokyo Tyrant nodes) gets about 1B+ update requests every 2 weeks. Still not extremely high volume but pretty decent.

March 4, 2010 | Unregistered CommenterMikeD

If 98% (or is my math way of?) is GET, I'm guessing you can cache alot? Are you doing that already?

March 4, 2010 | Unregistered CommenterSimon

Chris - yep quite stable thank you ;-)
Brandon - our peak vs mean is relatively modest due to how we currently replicate - daily peaks are about 4k/s, we've run happily at 8k/s occasionally too. The average is increasing now though as more and more big systems within the BBC use the service. At present we're averaging about 2k/s, and that number is a factor of the number of databases and writes.
MikeD - 20M docs, and almost entirely reads. Most of the posts are calls for replication, we only create/update data using PUTs.
Simon - there are layers and layers of caching on our infrastructure! PHP APC, memcached, varnish, respecting eTags etc. To some applications our CouchDB-backed KV service IS a cache - once you've got your connection, responses are < 15ms (mostly single digit but that depends on the size of the doc and on current network load). Most applications treat the KV service as a large, easy to use database, and so often cache the responses as tehy see fit.

March 5, 2010 | Registered CommenterEnda Farrell

And Chris - we're STILL running 0.9.0! We're working now on upgrading via 0.9.2 to 0.10, and I guess by the time we get round to it we might do 0.11 soon after. We haven't put priority on upgrading - we haven't had the need to ;-)

March 5, 2010 | Registered CommenterEnda Farrell

How do you handle sharding ?

March 8, 2010 | Unregistered Commenterbenoitc

benoitc, we have a Java API layer sitting above - more details will be available on Wednesday's QCon where I'm speaking. We shard using the database+key in the Ketama hashing algorithm with equally weighted arcs - we have commodity hardware and all have the same spec.

This code has been in production for almost a year now, but we know that some of your code and/or Lounge might be a better long-term option for us. We are always going to need _something_ of an API wrapper in order to make CouchDB "fit" into our platform's architecture, but the less work it does the better ;-)

March 8, 2010 | Registered CommenterEnda Farrell

enda, hope to see the video soon :)

March 9, 2010 | Unregistered Commenterbenoitc

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>