tags:

views:

294

answers:

3

I'm parsing XML in java using StaX, but my XML is not well-formed so the parser will throw error. In XML, there are unclosed-tags

for example :

<person>
  <name>John</name>
  <age>21
  ...
  ...
</person>

the <age> tag doesn't has closed tag </age>. So I need to fix the XML first..

how can I fix the XML to close the unclosed-tag?

is there a library to do this ? I've tried JTidy & HTMlCleaner, but I still can't figure out how to fix the XML. I need library in java, not stand alone app. Thanks

+5  A: 

I don't think there is a ready made solution to fix XML. That's because it's impossible to know if

<person>
  <name>John</name>
  <age>21
  <birthDate>...</birthDate>
  ...
</person>

is to be

<person>
  <name>John</name>
  <age>21
  <birthDate>...</birthDate>
  </age>
  ...
</person>

or

<person>
  <name>John</name>
  <age>21</age>
  <birthDate>...</birthDate>
  ...
</person>

I think that kind of logic can only be dealt with a custom String parser, where you say how data is to be transformed.

extraneon
so, how to do that ? to close <age> with </age>
Riizent
I've read that HTMLCleaner/tidy can do that for HTML format but not XML. I wonder, how HTMLCleaner do that or maybe modify it
Riizent
A: 

Instead of fixing the XML you can try to turn of validation with:

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_VALIDATING, false);
NA
it's still error and can't continue parse. When I read the javadoc, IS_VALiDATING is DTD validation
Riizent
"Valid" means something different than "well-formed"
kdgregory
+3  A: 

Find the person who generated the XML and beat them senseless.

It's a basic point of XML that a document is always well-formed. This is very, very easy to do, equally easy to test, and it's a foundation stone for everything else. Is someone out there is writing code which can't even get that right, they don't deserve to be working as a programmer. Seriously, they should be flipping burgers or digging ditches instead.

Writing code to deal with their crappy code is not a good long-term solution. It doesn't do anything to address the problem of their crappy code.

I appreciate that this probably doesn't help much.

Tom Anderson
Or tell them to switch to YAML so they don't have to close those pesky tags.
Trevor Tippins