tags:

views:

143

answers:

6
+2  Q: 

Java:XML Parser

Hi,

I have a response XML something like this -

<Response> <aa> <Fromhere> <a1>Content</a1> <a2>Content</a2> </Fromhere> </aa> </Response>

I want to extract the whole content from <Fromhere> to </Fromhere> in a string. Is it possible to do that through any string function or through XML parser?

Please advice.

+1  A: 

there are alot of them here are examples http://java.sun.com/developer/codesamples/xml.html

Rahul Garg
+2  A: 

Through an XML parser. Using string functions to parse XML is a bad idea...
Beside the Sun tutorials pointed out above, you can check the DZone Refcardz on Java and XML, I found it was a good, terse explanation how to do it.
But well, there is probably plenty of Web resources on the topic, including on this very site.

PhiLho
+1 for pointing out it's a bad idea to use 'string functions to parse XML'.
Nick Holt
The DZone Refcardz look interesting. But seriously: requiring complete address *and phone number* to register for a "for free" service?
Joachim Sauer
A: 

This should work

import java.util.regex.*

Pattern p = Pattern.compile("<Fromhere>.*</Fromhere>");
Matcher m = p.matcher(responseString);
String whatYouWant = m.group();

It would be a little more verbose to use Scanner, but that could work too.

Whether this is a good idea is for someone more experienced than I.

FarmBoy
Hi,the comment was incomplete.Can't see anything after "work:"
Pavan
Sorry, hit enter too soon.
FarmBoy
So long as those delimiters don't appear in a comment or CDATA or something...
McDowell
I strongly discourage using String functions (or regexp) to handle XML. Doing so will work as long as the XML has *exactly* the structure that you have in your example and any minor change will break it (additional properties, changed property order, self-closing tags, ...). It's much to fragile. Use a real XML parser.
Joachim Sauer
+3  A: 

You could try an XPath approach for simpleness in XML parsing:

InputStream response = new ByteArrayInputStream("<Response> <aa> "
        + "<Fromhere> <a1>Content</a1> <a2>Content</a2> </Fromhere> "
        + "</aa> </Response>".getBytes()); /* Or whatever. */

DocumentBuilder builder = DocumentBuilderFactory
        .newInstance().newDocumentBuilder();
Document doc = builder.parse(response);

XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("string(/Response/aa/FromHere)");
String result = (String)expr.evaluate(doc, XPathConstants.STRING);

Note that I haven't tried this code. It may need tweaking.

izb
Won't that strip the elements?
McDowell
Also, wrapping a StringReader in a StreamSource would be more encoding-agnostic.
McDowell
Thanks @izb.This worked wonderfully.Thank You.
Pavan
+2  A: 

You can apply an XSLT stylesheet to extract the desired content.

This stylesheet should fit your example:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:template match="/Response/aa/Fromhere/*">
     <xsl:copy>
      <xsl:apply-templates/>
     </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Apply it with something like the following (exception handling not included):

String xml = "<Response> <aa> <Fromhere> <a1>Content</a1> <a2>Content</a2> </Fromhere> </aa> </Response>";
Source xsl = new StreamSource(new FileReader("/path/to/file.xsl");

TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer(xsl);
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

StringWriter out = new StringWriter();
transformer.transform(new StreamSource(new StringReader(xml)), new StreamResult(out));

System.out.println(out.toString());

This should work with any version of Java starting with 1.4.

Massimiliano Fliri
You might want to set OutputKeys.OMIT_XML_DECLARATION="yes" on the transformer.
McDowell
Actually I used html method in the xsl to suppress the xml declaration, but your suggestion is way better. I included it in the response, thanks.
Massimiliano Fliri
A: 

One option is to use a StreamFilter:

class MyFilter implements StreamFilter {
  private boolean on;

  @Override
  public boolean accept(XMLStreamReader reader) {
    final String element = "Fromhere";
    if (reader.isStartElement() && element.equals(reader.getLocalName())) {
      on = true;
    } else if (reader.isEndElement()
        && element.equals(reader.getLocalName())) {
      on = false;
      return true;
    }
    return on;
  }
}

Combined with a Transformer, you can use this to safely parse logically-equivalent markup like this:

<Response>
  <!-- <Fromhere></Fromhere> -->
  <aa>
    <Fromhere>
      <a1>Content</a1> <a2>Content</a2>
    </Fromhere>
  </aa>
</Response>

Demo:

StringWriter writer = new StringWriter();

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLStreamReader reader = inputFactory
    .createXMLStreamReader(new StringReader(xmlString));
reader = inputFactory.createFilteredReader(reader, new MyFilter());
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new StAXSource(reader), new StreamResult(writer));

System.out.println(writer.toString());

This is a programmatic variation on Massimiliano Fliri's approach.

McDowell