views:

264

answers:

7

I have a somewhat large file (~500KiB) with a lot of small elements (~3000). I want to pick one element out of this and parse it to a java class.

Attributes Simplified

<xml>     
<attributes>
  <attribute>
     <id>4</id>
     <name>Test</id>
  </attribute>

  <attribute>
     <id>5</id>
     <name>Test2</name>
  </attribute>

<!--3000 more go here-->
</attributes>

class Simplified

public class Attribute{
  private int id;
  private String name;

  //Mutators and accessors

}

I kinda like XPath, but people suggested Stax and even VDT-XML. What should I do.

+3  A: 

500 kb is not that large. If you like XPath, go for it.

extraneon
Okay thanks ^^, but that's just a personal preference. I'd like to know what other options are out there to use. What would be the best parser for this situation.
Timo Willemsen
+1  A: 

Whenever I have to deal with XML I just use XMLBeans. It may be overkill for what you are after, but it makes life easy (once you know how to use it).

TofuBeer
Thanks! I read the first few examples I could find about it. I think I will use this in future projects but for this it seems indeed like an overkill because the data is very simple. There is just one kind of object I want to retrieve.
Timo Willemsen
+1  A: 

If you don't care about performance at all, Apache Digester may be useful for you, as it will already initialize the Java objects for you after you define the rules.

Uri
Yeah, I like digester for mapping to Java classes - works quite nicely - and easy to expand if the XML/classes evolve over time.
monojohnny
+2  A: 

Avoid anything that is a DOM parser - no need for that, especially with a large-ish file and relatively simple XML syntax.

Which specific one to use, sorry, I haven't used them, so I can't give you any more guidance than to look at your licensing, performance, and support (for questions).

ssnyder
Agree that you don't really need DOM here - you could probably implement in SAX [generally faster]. (although using DOM might make your life easier to map the data to the class - but you might as well go the whole hog in that case, and use 'Digester' to do the work for you. (or XMLBeans - not personally used this, so can't comment)).
monojohnny
+2  A: 

My favorite XML library is Dom4j

Itay
Mine too. It's very approachable compared with the JDK APIs.
Drew Wills
+2  A: 

I have commented above as well, because there are few options to consider - but by the sound of it your initial description I think you could get away with a simple SAX processor here: which will probably run faster (although it might not look as pretty when it comes to mapping the Java class) than other mechanisms:

There is an example here, which matches quite closely with your example:

http://www.informit.com/articles/article.aspx?p=26351&amp;seqNum=6

monojohnny
+1, that's how I'd have done it too. Simple and low-overhead. For anything more complicated, XPath is an easy fallback.
PSpeed
Thanks for this. I'll use this =)
Timo Willemsen
+3  A: 

I kinda like XPath, but people suggested Stax and even VDT-XML. What should I do.

DOM, SAX and VTD-XML are all three different ways to parse a XML document. Roughly in this order of memory efficiency. DOM needs over 5 times of memory as XML file big is. SAX is only a bit more efficient, VTD-XML uses only a little more memory than the XML file big is, about 1.2 times.

XPath is just a way to select elements and/or data from a (parsed) XML document.

With other words, you can just use XPath in combination with any of the XML parsers. So this is after all a non-concern. If you just want to go for best memory efficiency and performance, go for VTD-XML.

BalusC
Technically speaking, SAX has very little overhead in the parser. It's what the code does with what it's parsing that will use most of the memory. As evidence, if your data handler does not instantiate any objects then you can use a SAX parser to parse XML many times greater than will fit in available RAM.
PSpeed