I am trying to parse this huge 25GB Plus wikipedia XML file. Any solution that will help would be appreciated. Preferably a solution in Java.
Ofcourse it's possible to parse huge XML files with Java, but you should use the right kind of XML parser - for example a SAX parser which processes the data element by element, and not a DOM parser which tries to load the whole document into memory.
It's impossible to give you a complete solution because your question is very general and superficial - what exactly do you want to do with the data?
I would go with StAX as it provides more flexibility than SAX (also good option).
Yep, right. Do not use DOM. If you want to read small amount of data only, and want to store in your own POJO then you can use XSLT transformation also.
Transforming data into XML format which is then converted to some POJO using Castor/JAXB (XML to ojbect libraries).
Please share how you solve the problem so others can have better approach.
thanks.
--- EDIt ---
Check the links below for better comparison between different parsers. It seems that STAX is better because it has control over the parser and it pulls data from parser when needed.
http://java.sun.com/webservices/docs/1.6/tutorial/doc/SJSXP2.html