ansaurus

Question

Parsing Very Large XML file with Ruby on Rails (1.4GB) -- Is there a better way than SAXParser?

Answer 1

+1 A:

Can you fetch whole file first? If so, then I'd suggest splitting an XML file into smaller chunks (say, 512MBs or so) so you could parse simultaneous chunks at one time (one per core), 'cause I believe you have modern CPU. Regarding the invalid or malformed xml - just append or prepend missing XML with simple string manipulation.

You can also try profiling your callback method. It's a big chunk of code, I'm pretty sure there should be at least one bottle neck which could save you a few minutes.

Eimantas 2010-05-18 19:25:49

Yes, the code isn't pretty, but the speed is only a minor issue. the big issue is handling dependencies within some of the pricing and criteria of the XML. Since it is just a big list of independent products, I could potentially split the file up a bit and process multiple files at a time though. That isn't a bad idea.

DBruns 2010-05-18 19:35:56

ansaurus

tags:

views:

answers:

Parsing Very Large XML file with Ruby on Rails (1.4GB) -- Is there a better way than SAXParser?

related questions