Is there a C++ code or library to convert a HTML document to a XML document? Thanks.
A:
If your XHTML is properly formed, then it is pretty much XML.
If you use any C++ xml parser you can load the document.. and hope it can parse it, then write it back out again.
bobobobo
2010-03-06 17:38:26
`<ul><li>one<li>two<li>three</ul>` is valid HTML 4 strict, but is not xml.
Jeffrey Aylesworth
2010-03-06 17:40:08
I have already tried, but it didn't work. I'm using libxml++ 2.6
Eduardo
2010-03-06 17:44:08
@Jeff, I know. I added an (X) in front.
bobobobo
2010-03-06 18:26:10
+3
A:
You can take a look at Tidy library
Tidy is composed from an HTML parser and an HTML pretty printer. The parser goes to considerable lengths to correct common markup errors. It also provides advice on how to make your pages more accessible to people with disabilities, and can be used to convert HTML content into XML as XHTML.
The library is written in C.
Bertrand Marron
2010-03-06 17:58:49
There is a simple example on the link I gave you. (http://tidy.sourceforge.net/libintro.html#example)
Bertrand Marron
2010-03-06 19:41:35
A:
I wanted to convert to XML to parse it with libxml++, but I found this library: http://htmlcxx.sourceforge.net/ With it I can parse XML and HTML without any conversion.
Eduardo
2010-03-14 12:38:45