views:

147

answers:

3

Is there a C++ code or library to convert a HTML document to a XML document? Thanks.

A: 

If your XHTML is properly formed, then it is pretty much XML.

If you use any C++ xml parser you can load the document.. and hope it can parse it, then write it back out again.

bobobobo
`<ul><li>one<li>two<li>three</ul>` is valid HTML 4 strict, but is not xml.
Jeffrey Aylesworth
I have already tried, but it didn't work. I'm using libxml++ 2.6
Eduardo
@Jeff, I know. I added an (X) in front.
bobobobo
+3  A: 

You can take a look at Tidy library

Tidy is composed from an HTML parser and an HTML pretty printer. The parser goes to considerable lengths to correct common markup errors. It also provides advice on how to make your pages more accessible to people with disabilities, and can be used to convert HTML content into XML as XHTML.

The library is written in C.

Bertrand Marron
Do you know any example to use it?
Eduardo
There is a simple example on the link I gave you. (http://tidy.sourceforge.net/libintro.html#example)
Bertrand Marron
A: 

I wanted to convert to XML to parse it with libxml++, but I found this library: http://htmlcxx.sourceforge.net/ With it I can parse XML and HTML without any conversion.

Eduardo