views:

123

answers:

1

I have an xml file that I am parsing and I have the following tag...

<desc>
/wap/news/text.jsp?sid=242&nid=5662369&cid=5038&scid=-1
</desc>

I don't have control over the format of this xml file but I need to interpret the desc content as a partial url that I will later append to a base URL and retrieve a new file.

When I parse this the desc tag has one child, a text node with a value of...

/wap/news/text.jsp?sid=242

but the rest of the line is parsed as 6 child nodes on the above text node with values of...

&
nid=5662369
&
cid=5038
&
scid=-1

How do I make the parser treat this as just a single text node and not interpret the '&' symbols as child nodes.

The relevant parsing code is below.

HttpConnection c = (HttpConnection) Connector.open(inURL.toString(), Connector.READ);
is = c.openInputStream();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setIgnoringElementContentWhitespace(true);
factory.setAllowUndefinedNamespaces(true);
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(is);

This is J2ME code on a blackberry so I'm pretty limited as to the APIs I have available.

+3  A: 

& is a special character in XML. It needs to be escaped as &amp;.

If something is producing the above text file, then it's not producing valid XML.

Anon.
Valid or not sometimes we are stuck dealing with what we have at hand.
whatnick
Anon.
@whatnick - expecting an XML parser to do something sensible with invalid XML is analogous to expecting a Java compiler to do something sensible with C.
Stephen C
Yes I can't do anything about the XML. Its not mine and they won't fix it. The slurping into a buffer and fixing pre parse sounds like the best option. I'll give that a shot. Thanks.
Maven
Works like a charm. I just slurp the file into a buffer and do an XML escape pass before sending to the parser. There were many more problems than what I showed. Their files are a mess but its working now. Thanks for the tip.
Maven
gpampara