views:

89

answers:

1

I have a large XML file (600mb+) and am developing a PHP application which needs to query this file.

My initial approach was to extract all the data from the file and insert it into a MySQL database - then query it that way. The only issue with this was that it was still slow, plus the XML data gets updated regularly - meaning I need to download, parse and insert data from the XML file into the database everytime the XML file is updated.

Is it actually possible to query a 600mb file? (for example, searching for records where TITLE="something here"?) Is it possible to get it to do this in a reasonable amount of time?

Ideally would like to do this in PHP, though I could also use JavaScript too.

Any help and suggestions appreciated :)

+1  A: 

Constructing an XML DOM for a 600+ Mb document is definitely a way to fail. What you need is SAX-based API. SAX, though, does not usually allow XPath to be used, but you can emulate it with imperative code.

As for the file being updated, is it possible to retrieve only differences anyhow? That would massively speed up subsequent processing.

Anton Gogolev
I've attempted to use SAX - which parses the file fine but is still relatively slow - I could really do with something for querying the file.I may give SAX another try if nothing else comes up - thanks for answering :)
Flava
@Flava SAX-based parsing is generally more performant since it allows file contents to be streamed. Ensure you have a big enough buffer so that disk IO hit is ruled out.
Anton Gogolev