tags:

views:

70

answers:

2

is there any c++ libraries avialable to read

+2  A: 

libcurl is your friend + tidy (HTML tidy) if you've got broken HTML to fix.

Edit: Here is the full sequence

HTML (in file) -> tidy (which will clean up the malformed HTML) -> XSLT transformation (you'll need to provide an XSL file to translate your HTML to latex), and use libxml/libxsl (http://xmlsoft.org/) -> latex document is then processed using latex (by forking out to latex the command) or if you want, you could download the source code for lyx and see how they do it (http://www.lyx.org/). Unfortunately the sequence is too complex to write into a single example, all I can give you is the sequence...

Nim
can please share sample program how it used in c++
khyathi
first, what are you trying to do? e.g. read from a URL? read from a file? Do you need to treat it as a DOM, or are you looking for something specific - without that information, it's a stab in the dark...
Nim
i am just reading that from file after that i want to convert in to post script any idea on c++ libraries which do this task
khyathi
if you already have the HTML in a file, you could try "html2ps", I think you can get this pre-installed on *nixes or there is a perl script available via google. If you want a nice library to do this, not sure there exists one (Apache FOP is one option, however you'd have to somehow delegate the formatting operations to java from your C++)
Nim
i should not use that i need write c++ program to convert that into post script
khyathi
if you REALLY wanted to do this programatically, there is one other option.. use XSL to translate your well formed HTML to latex, and the use latex to build your PS file..
Nim
we cant use any scripting tool
khyathi
latex is not a scripting tool, it's a program available on most flavours of *nix. Okay, you really are making this difficult for yourself. PS is a simple programming language, so you could in theory write your own XSL to translate HTML directly to PS. This is probably your best bet.
Nim
A: 

Have a look at the following:

Also there was a similar question asked already.

vitaut
hay i need it for linux not for windows and there should'nt be any scripting
khyathi
Both htmlcxx and wcHTML should be available for Linux.
vitaut
I need seperate files compiled for html parser so that i can directly add them to my C++ project and use it
khyathi