views:

549

answers:

1

Hello all,

I am facing the problem to handle many xml-files that are all greater than 70 MB. Validating and accessing them cost a lot of time. Now I am wondering if the following steps could increase my applications performence.

I can compress a 70MB xml-file in a gzip-file < 1MB. So I could hold only gzip files. Working with the data could be done like this:

  • Make the gzip available with java.io.File (only the small file)
  • Using for example StringBufferInputStream and GZIPInputStream to extract the conten within the RAM
  • work with the content (RAM): Parse, Validate, ...
  • create a String in RAM that represents the new xml-content (RAM)
  • Use GZIPOutputStream to access the file system (small content again)

Can I do this or id there a misapprehension in my thoughts?

THX in advance! Hans

+1  A: 

Reading 70 MB from the HD should take no more than 1-2 seconds (depending on your hardware, of course), so if you're having a delay greater than, say, 4 seconds, the bottleneck is not your HD but XML processing and anything you do with that.

Before getting into coding your gzip idea (which sounds fine), you could hardcode a sample XML to your code (yes, insert 70 MB as a single string), run your app having a nice button saying ("do it!") - or wait for user input if you're in the terminal - and see how much XML processing takes.

This approach will load your 70 megs into memory (as code) before processing, so you should be able to see how much it really takes to consume it.

After that, if you see it's processed quickly enough, the problem is clearly the HD. If not, then you should try to optimize your XML processing.

Seb
Ok, I have to play with some scenarios of course and I will test yours too.
The origin of my problem is, that there are many accesses to big files. For example on application startup, my navigator has to check (validate) all files within a folder to make the right icon decoration or reports on invalidity.
Furthermore I can not hold every parsed file in memory as Java-object, because that would exploid the RAM.
Disk fragmentation can also be to blame... you'll have to play around :)
Seb
If you have a problem holding the xml files in RAM then gzipping them to and from disk wont help. Can you parse them into a more efficient memory structure rather then and xml dom? Alternatively, at the risk of redefining your problem, can you use SAX?
Richard A
You need to locate where the performance problem is before you try addressing it. Simple method: get the value of System.currentTimeMillis() at various key points (before and after reading the 70mb file, before and after parsing the XML, etc). You cannot optimise until you know where the inefficiency is!
KarstenF