tags:

views:

274

answers:

5

I was using Cobra until now because of how easy it was but unfortunately it had some problem with a few test cases. Does anyone suggest a tried-and-tested library?

I've tried Cobra's built in one and HTMLCleaner without any luck.

+1  A: 

Take a look at Saxon (no, I'm not involved in any way with the product, just a satisfied user).

Jim Garrison
Thanks. Just realized I asked the wrong question...
Legend
Pavel Minaev
@Pavel - The original question didn't mention HTML
Jim Garrison
+1  A: 

Mozilla HTML Parser looks rather interesting. By definition, it's supposed to be as good as Gecko engine itself, which is likely to cover your needs.

Pavel Minaev
+4  A: 

TagSoup is really great when dealing with crappy HTML/XHTML.

Jericho (and NekoHTML) are good too to parse non valid HTML.

TagSoup and Jericho: tried-and-tested. NekoHTML: feedback from trustable source.

Pascal Thivent
+1 for NekoHTML
flybywire
+1  A: 

[Answering the title - the overall question and comments are not consistsent]

JTidy (http://jtidy.sourceforge.net/) is a port of Dave Raggett's HTMLTidy. It's very useful though I think development may have slowed/ceased.

peter.murray.rust
+1  A: 

I suggest Validator.nu's parser, based on the HTML5 parsing algorithm. (Mozilla is currently in the process of replacing its own HTML parser with this one.)

Ms2ger