ansaurus

Question

Answer 1

A:

This means that JSON parsers can't handle the file as it. But we could write a line parser to read in the file and push each line through a JSON parser.

Does this sound correct?

That sounds reasonable... so you'd end up with a large array of lines delimited by line breaks, each line consisting of one JSON object.

Jason S 2009-01-27 15:17:18

Answer 2

+2 A:

That's correct, I've been unable to find a json parser that does not require the whole thing to be in memory at once, at least during some part of the process (I had a database dump in json format i need to parse...it was a nightmare).

The common way this is currently done is either with object style or csv style

object style:

{"name":"bob","position":"ceo","start_date":"2007-08-10"}
{"name":"tom","position":"cfo","start_date":"2007-08-11"}

,etc.

csv style:

["name","position","start_date"]
["bob","ceo","2007-08-10"]
["tom","cfo","2007-08-11"]

You waste a lot of disk space with the object style but each line is self contained.

You save disk space with csv style but your data is more tightly coupled to the format and unless you need to have nested data structures like:

["bill","cto","2007-08-12",{"projects":["foo","bar","baz"]}]

you might as well actually use the CSV format.

Freshhawk 2009-03-04 00:29:47

Answer 3

+3 A:

The whole idea of JSON doesn't exactly coexist with storing several million entries in a file...

The whole point of JSON was to remove the overhead caused by XML. If you write each record as a JSON object then you are back to storing overhead bits that have no meaning. The next logical step is to write out a regular CSV file with a header record that everything on the planet understands how to import.

If, for some reason, you have child records then you should look at how regular EDI works.

Chris Lively 2009-03-04 01:00:01

Answer 4

+1 A:

Your strategy sounds right: have single objects in JSON and generate/parse them with standard JSON tools, and handle the grouping problem yourself outside JSON.

Besides dumping all the data in just one file, you may want to consider other strategies. For example, you can keep each object in a separate file, or (if that's excessive since you say you have millions of objects) batch them up in files in reasonable groups, and naming the files according to some identifier that you have for these objects, either just primary key (so you get "0-10000", "10001-20000" etc) or something else. E.g for log entries, date/time would be appropriate. This way, should some poor soul need to use or examine this data in any shape some day, it's a bit more manageable. And to get these files into archival format, just zip/compress them into one file, JSON as text data should compress quite well.

Jaanus 2010-02-16 21:42:35

ansaurus

tags:

views:

answers:

JSON as a DB export format.

related questions