tags:

views:

343

answers:

2

I am analyzing couchDB at the moment. Is it possible to storage MB worth of data per document? E.g. a JPEG image.

I understand I would need to encode (base64 or something) the said data in order to fit the JSON container.

Practical advice sought please.

+4  A: 

As zed said in his comment the best way to do this is using attachments. The Wiki has a section on this: http://wiki.apache.org/couchdb/HTTP%5FDocument%5FAPI#Attachments

the basic idea is like so:

{
  "_id":"attachment_doc",
  "_attachments":
  {
    "foo.txt":
    {
      "content_type":"text\/plain",
      "data": "VGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIHRleHQ="
    }
  }
}

You are correct that you should Base64 encode the attachments contents. You can have multiple attachments per document.

NOTE from the wiki: Please note that any base64 data you send has to be on a single line of characters, so pre-process your data to remove any carriage returns and newlines.

Jeremy Wall
I realize this thanks: I am more concerned about the *size* per document, hence the "practical advice" note.
jldupont
I'm fairly sure that megabytes of size won't have a detrimental effect on the database. It shouldn't affect query times or access times if that's what you were worried about. The only caveat about attachments is that you can't map/reduce over their contents so if you need to deep query them that might be an issue.
Jeremy Wall
Note that you don't have to upload your attachments in Base64. You can create the document, and then separately add the attachment in a raw put request. See http://wiki.apache.org/couchdb/HTTP%5FDocument%5FAPI#Standalone_Attachments for details.
Brian Campbell
@brian: you should have put it in an answer: I am very good on up-votes.
jldupont
I'm kinda late back here, but of course I meant real and not inline attachments, exactly what Brian is talking about. I never really understood the point of inline attachments; reason might be I'm still living somewhere between web 1.1 and web 1.2.
Zed
+1  A: 

I have never tried big documents but I'm using documents with big attachments (JPEGs > 10 Mpix) and that works well.

The main issue is that which such huge database sizes replication tend to break in new and interesting ways every week.

mdorseif