views:

179

answers:

4

I am parsing huge xhtml files and am trying to play around with the content in it. Basically the words in it, their positions etc. I tried using the HashMap, ArayList etc. All of them give OutOfMemory issue after loading 130347 data. What is the kind of data structure that can be used to hold huge data in JAVA.

A: 

Your question is pretty vague. But if you run out of memory then you should probably use an on-disk database instead. PostgreSQL, MySQL, HSQLDB, whatever.

intgr
Do you mean to say that the information that i collect from the document can be written into a hsqldb with a proper data structure on the local disk instead of loading into the memory so that can query what i need in an as needed basis. Since i need it only for that request, at the end of processing i have to delete my inserts is it? This is quite intresting. I have not applied hsqldb solutions for real time applications. Could you please tell me the tradeoff that i need to do for this kind of solution like performance since i will have to insert huge number of data making lot many calls?
Rachel
Which database would you suggest to use, to load data temporarily for a request and clearing them of at the end of the request.
Rachel
A 10MB XML file is by no means "huge data", so a disk database is probably overkill.
intgr
+1  A: 

What you are doing now, sucking all your data into one huge structure and then processing it, is not going to work regardless of what data structure you use. Try an incremental approach where you read some data, then process it, then read some more, etc. (Actually what you'd be doing this way is creating your own special-purpose data structure that handles the processing in chunks, so my first sentence isn't really accurate.)

One way to do this might be to parse the document using SAX, which uses an event-driven approach. You could have your content handler create and store objects you construct from reading the xml elements, process them once enough have accumulated, then clear the collection.

Nathan Hughes
Thanks i will try with SAX
Rachel
There is another answer somewhere on this site describing using SAX in conjunction with some xml-object-model library (probably JDOM), if I can find it I'll add the link to this answer.
Nathan Hughes
Thats great. It will be very helpful. Thanks
Rachel
+1  A: 

Look into your virtual machine memory settings. You can modify the VM memory size via the command line if that's where you are, or via a config file if you are in some kind of server side environment.

If you are using tomcat/eclipse, this thread should help you: http://stackoverflow.com/questions/334102/eclipse-memory-settings-when-getting-java-heap-space-and-out-of-memory

Zak
Good point. If you're running your app from the command line, you can pass something like -Xmx4G to allow it to use 4 gigabytes of memory.
intgr
I am running on a tomcat server from eclipse.
Rachel
+1  A: 

Consider using a SAX parser, it is less memory intensive.

Sean
Thanks i will try with SAX.
Rachel