views:

63

answers:

2

I am preparing some custom performance tests against a legacy application that outputs nonstandard HTML (missing tags, duplicate quotes, missing quotes, the works) that can't be changed right now for all the usual reasons.

I am looking for a library similar to BeautifulSoup or "HTML Agility Pack" that can be called from C or Java on a UNIX host.

We'll build some test scaffolding and then start redesigning and reimplementing, but I need some baseline measurements first.

+1  A: 

TagSoup - http://home.ccil.org/~cowan/XML/tagsoup/

danben
I wish I could accept both answers.
florin
+2  A: 

Tidy or JTidy

Nikolaus Gradwohl