tags:

views:

41

answers:

1

Hello,

I'd like to create a program that generates content from very large XML files, upwards of 500mb in size. I'll need to get data from the files at various times, but the user is willing to wait for a bit, and it's all done on the local machine.

I was womdering if anyone had any advice regarding:

Implementation languages Whether x path is enough for light querying on an (admittedly huge) database Any other advice

I probably only need to use less than 1% of the data, and i can't expect to do processing beforehand to prepare the file.

Any tips?

In response to the comment: I could break the file up, but only by reading it in, and writing it out again. So not really, essentially. I only use the file once, to generate this 'content' using select (and nondeterministically chosen) entries in the xml file given. Then I never need that file again.

+1  A: 

I saw this link in stackoverflow which is somewhat related to this

Raghuram
Thanks, I saw that too. I didn't want to go Java, but I think that might be the best option. I'll check that out and report back!
mtc06
Okay, that did it - sort of. I settled for Python in the end and employed a SAX parser. I feel that I will probably construct a temporary database in the future, as the more I develop this application the more I find myself wanting to do rich searches of the data, but for now - SAX is the way to go!
mtc06