views:

84

answers:

3

Hi,

I am trying to process somewhat large (possibly up to 200M) JSON files. The structure of the file is basically an array of objects.

So something along the lines of:

[
  {"property":"value", "property2":"value2"},
  {"prop":"val"},
  ...
  {"foo":"bar"}
]

Each object has arbitrary properties and does not necessary share them with other objects in the array (as in, having the same).

I want to apply a processing on each object in the array and as the file is potentially huge, I cannot slurp the whole file content in memory, decoding the JSON and iterating over the PHP array.

So ideally I would like to read the file, fetch enough info for each object and process it. A SAX-type approach would be OK if there was a similar library available for JSON.

Any suggestion on how to deal with this problem best?

A: 

There is http://github.com/sfalvo/php-yajl/ I didn't use it myself.

Alexandre Jasmin
The latest commit comment doesn't really help earn my trust => "Arrays are crashing for no observable reason. "
Guillaume Bodi
Presumably, that last commit fixed that. So you just arrived it time :-)
Thilo
@Thilo: Are you working on the project?
Guillaume Bodi
No. But all my commit messages also look like that: Description of the bug that was fixed.
Thilo
I see :) Usually mine are a clear on the fact that I solved the bug though.
Guillaume Bodi
+1  A: 

There exists something like this, but only for C++ and Java. Unless you can access one of these libraries from PHP, there's no implementation for this in PHP but json_read() as far as I know. However, if the json is structured that simple, it's easy to just read the file until the next } and then process the JSON received via json_read(). But you should better do that buffered, like reading 10kb, split by }, if not found, read another 10k, and else process the found values. Then read the next block and so on..

joni
Well, the objects can potentially have objects as properties. I have no control over the content of the objects themselves. Sounds like a job of for a lexer/parser or I could slice it by hand by counting `{` and `}`'s. I'd like to avoid getting down to that though.
Guillaume Bodi
yeah, I can understand that you don't want to do this by hand..
joni