views:

350

answers:

3

What's the best way to store large JSON files in a database? I know about CouchDB, but I'm pretty sure that won't support files of the size I'll be using.

I'm reluctant to just read them off of disk, because of the time required to read and then update them. The file is an array of ~30,000 elements, so I think storing each element separately in a traditional database would kill me when I try to select them all.

+1  A: 

You don't really have a variety of choices here, you can cache them in RAM using something like memcached or push them to disk reading and writing them with a databsae (RDBMS like PostgreSQL/MySQL or DOD like CouchDB). The only real alternative to these is a hybrid system of caching the most frequently accessed documents in memcached for reading which is how a lot of sites operate.

2+MB isn't a massive deal to a database and providing you have plenty of RAM they will do an intelligent enough job of caching and using your RAM effectively. Do you have a frequency pattern of when and how often these documents are accessed and how man users you have to serve?

Phil
Yes, quite often I will have to serve the entire document, and occasionally individual items in the JSON document will have to be updated. I'm guessing caching to disk and then making changes in RAM somehow is the way to go, then.
Paul
+2  A: 

If you intend to access specific elements one (or several) at a time, there's no way around breaking the big JSON into traditional DB rows and columns.

If you'd like to access it in one shot, you can convert it to XML and store that in the DB (maybe even compressed - XMLs are highly compressible). Most DB engines support storing an XML object. You can then read it in one shot, and if needed, translate back to JSON, using forward-read approaches like SAX, or any other efficient XML-reading technology.

But as @therefromhere commented, you could always save it as one big string (I would again check if compressing it enhances anything).

Traveling Tech Guy
+3  A: 

I have lots of documents in CouchDB that exceed 2megs and it handles them fine. Those limits are outdated.

The only caveat is that the default javascript view server has a pretty slow JSON parser so view generation can take a while with large documents. You can use my Python view server with a C based JSON library (jsonlib2, simplejson, yajl) or use the builtin erlang views which don't even hit JSON serialization and view generation will be plenty fast.

mikeal