Use your HEAD - checking CouchDB document existence

March 1, 2013 | 2 min Read

One common task when working with CouchDB is to find out whether a document with a given ID exists. A simple solution is to send an HTTP GET request with the ID to CouchDB and check the response’s HTTP status code.

A GET request, executed, for example with curl

curl https://localhost:5984/mydatabase/mydocumentid

will return the document with an HTTP status code 200 if it is successfully found. If the document does not exist, an HTTP status code 404 will be returned, with the following response body:

{"error":"not_found","reason":"missing"}

While this works well, it is not the most efficient solution. If the request is successful, the whole document is read from the database, put into the HTTP reponse and sent over the network - and the receiver is not interested at all in the document itself. In many cases this might not be a problem, but imagine a document of several megabytes sent over a slow internet connection.

It can be worse when you’re using one of the many Java APIs for CouchDB. They usually offer high-level APIs for GET requests which do the JSON parsing of the response body automatically - so, even more time is lost parsing the JSON String to create a Java object which isn’t needed.

A better approach is to use HTTP HEAD requests to check for a document’s existence. HEAD requests return responses which only have header information, no body. Hence responses are always small. Additionally, the handling of the request on the CouchDB side can be done very efficiently. To fill the response’s header, only document meta data like revision number or size needs to be determined. There is no need to read the document itself.

A HEAD request with curl

curl --head https://localhost:5984/mydatabase/mydocumentid

returns (in the case of success) a response like

HTTP/1.1 200 OK
Server: CouchDB/1.2.0 (Erlang OTP/R15B)
ETag: "1335-788fe7a4e1323206fcd0df76dcabd163"
Date: Thu, 28 Feb 2013 19:41:37 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 2705
Cache-Control: must-revalidate

Note that some of the Java APIs (Ektorp, LightCouch) offer methods to check for a document’s existence (CouchDbConnector.contains() and CouchDBClient.contains, respectively) which use HEAD requests internally. Others (JCouchDB, JRelax, CouchDB4J) don’t. Fortunately they’re open source and it’s quite straightforward to implement these missing methods yourself, using the underlying Apache HTTPComponents or Restlet APIs.