how to skip well-formedness check XML

views:

184

answers:

+2 Q:

how to skip well-formedness check XML

Hi,

I am using Xpath (and java) to extract information from some websites. However my problem is that since some of these websites are not well-formed, I cannot process them. Is there any way to avoid well-formedness check or alternatively specify tags that should'nt be checked for well-formedness?

Thanks Rp

+5 A:

Preprocess with Tidy.

Morendil 2009-02-10 18:20:29

There's actually a Java port: http://sourceforge.net/projects/jtidy

BC 2009-02-10 18:23:00

+1 A:

You probably don't want to use an XML parser to parse HTML. You'd be better off using a library such as HtmlUnit or HtmlParser.

Marc Novakowski 2009-02-10 18:21:08

+2 A:

TagSoup is a SAX-compliant parser written in Java that can handle all kind of broken HTML. Try to use TagSoup as your XML parser and then process the output through Xpath.

potyl 2009-02-10 18:29:16

+3 A:

Check out http://nekohtml.sourceforge.net/ for turning the HTML into a DOM object

Rob Di Marco 2009-02-10 18:43:21

mirdita si je a je ir

lona 2010-07-03 12:29:58

ansaurus

tags:

views:

answers:

how to skip well-formedness check XML

related questions