tags:

views:

100

answers:

4

Hi everibody.

I'm kinda new to this of XML parsing. So I'll appreciate if you can help me with this.

I need to extract some data that's inside an XML document whose structure is:

<DWDocument DW5BasketFileName="DOCU0001.001">
  <FileInfos>
    <ImageInfos>
      <ImageInfo id="0,0,0" nPages="0">
        <FileInfo fileName="PATH_1" dwFileName="FILE_NAME_1" signedFileName="FILE_NAME_2" type="normal" length="77324" />
      </ImageInfo>
    </ImageInfos>
  </FileInfos>
  <FileDatas />
  <Section number="0" startPage="0" dwguid="d8a50daf-d4df-4012-ad0c-85e26a6e0755">
    <Metadata version="0">
      <FieldProperties>
        <TextVar length="20" field="FIELD_1" id="0">9866627</TextVar>
        <TextVar length="20" field="FIELD_2" id="1">78050830431</TextVar>
        <TextVar length="40" field="FIELD_3" id="32">GOMEZ PADILLA</TextVar>
        <TextVar length="40" field="FIELD_4" id="33">JOSSER KICO</TextVar>
        <Date field="FIELD_5" id="64">1985-07-02T00:00:00</Date>
      </FieldProperties>
    </Metadata>
  </Section>
</DWDocument>

I'm inside a Java Desktop Application. I wan't to know how to do it, And if its possible maybe a code example.

I need to extract FIELD_1 to FIELD_4 values (986627, ...) each one is a different variable.

Thanks.

A: 

If this is the entire XML document you might as well just use a regular expression to extract the characters between a ">" and "". This will save you a lot of overhead of building a DOM document (by using JDOM, for instance) or handle the callbacks from a SAX parser.

Jeroen van Bergen
Not using a proper xml parser when parsing xml feels like a dangerous path. Apart from that, the regex would catch FIELD_5 as well.
Buhb
+3  A: 

U can use XPath


String filename = "C:\\a.xml";
String expression = "//TextVar";
try {
 Document document = DocumentBuilderFactory.newInstance()
   .newDocumentBuilder().parse(new File(filename));
 NodeList nn = (NodeList) XPathFactory.newInstance().newXPath()
   .evaluate(expression, document, XPathConstants.NODESET);
 for (int i = 0; i < nn.getLength(); i++) {
  Node item = nn.item(i);
  String field = item.getAttributes().getNamedItem("field").getTextContent();
  String number = item.getTextContent();
  System.out.println("field=" + field);
  System.out.println("number=" + number);
 }
} catch (Exception e) {
 throw new RuntimeException(e);
}
output:
field=FIELD_1
number=9866627
field=FIELD_2
number=78050830431
field=FIELD_3
number=GOMEZ PADILLA
field=FIELD_4
number=JOSSER KICO
01
You're a genius. Is a sorry I can't vote up more :D thanks!
Sheldon
+3  A: 

If your needs are restricted to extracting values from an XML document and no more, XPath queries would be sufficient.

The Sun JRE comes with a built-in XML parser, XSLT transformer and XPath engine. On other JREs you would need to package an XPath engine like Xalan.

A good tutorial to get you bootstrapped on XPath in Java 5 is available at IBM Developerworks.

The most important classes to start referrring to in the Java API documentation would be

  • DocumentBuilderFactory
  • DocumentBuilder
  • Document
  • XPathFactory
  • XPath
  • XPathExpression
  • XPathConstants

The first three classes would help you load the contents of an XML document, into an object, that you can later use in XPath queries. The latter four classes are important from the point of view of creating XPath expressions and for casting the result of the expression back to a suitable object in your application.

Vineet Reynolds
Thank you too also! I'll check the link you provided me.I thought it was necessary to add to lib Xerces or something more.
Sheldon
Xerces and Xalan is already available in the Sun Java 5 JRE and above. You can find more details at the JAXP compatibility notes http://java.sun.com/j2se/1.5.0/docs/guide/xml/jaxp/JAXP-Compatibility_150.html where the actual packages are mentioned. However, do not rely on the implementation version, i.e. the internal packages, and instead use the DocumentBuilder, XPathFactory classes et al to avoid issues.
Vineet Reynolds
A: 

The same code written in VTD-XML, in case you are confused by so many factories:

import com.ximpleware.*;

public class Example1 {
    public static void main(String[] argv) throws Exception{
     VTDGen vg = new VTDGen();
     if (vg.parseFile("c:/test2.xml",true)){
      VTDNav vn = vg.getNav();
      AutoPilot ap = new AutoPilot(vn);
      ap.selectXPath("//TextVar/text()");
      int i;
      while((i=ap.evalXPath())!=-1){
       System.out.println(" text value ==>"+vn.toString(i));
      }
     }
    }
}

Output

 text value ==>9866627
 text value ==>78050830431
 text value ==>GOMEZ PADILLA
 text value ==>JOSSER KICO
vtd-xml-author