tags:

views:

68

answers:

1

My XML is mal-formated for tag. Specifically, I want every tag that is not ended with to be corrected. How do I match such pattern and using ReplaceAll to do that?

Pattern r = "<img.*?[^/]>" // sth like that?
+4  A: 

You forgot a semicolon :)

No seriously, use a (X)HTML parser/cleanup API which can convert tagsoup (HTML) to XHTML. Under each JTidy can do that in a single call:

new Tidy().parseDOM(inputStream, outputStream);

Regex is simply not well suited for this job.

BalusC
Thanks. But is it necessary to include an external library for this simple operation? I just want something simpler.
Yang
Simple operation? Why can't you code it yourself than? ;) It really isn't as simple as you seem to think.
BalusC
Just add the jar to your classpath :)
Alfred
Sorry I am developing for mobile apps so I need to minimize the usage of outcoming resources to ensure the performance. I don't see it necessary to add the jar at this moment. I'll see if someone else can come up with a better idea.
Yang
Well, then write/reinvent HTML parser/cleanup API yourself so that you don't need to include a JAR :)
BalusC