tags:

views:

482

answers:

5

There are many tools for converting latex into html. I'm looking for a Java or C++ program to do this. It will need to run on multiple operating systems. The solution will be used on academic papers, so it should ideally also be able to interpret things like bibtex.

I found htmltolatex which is a "Java program for converting HTML pages into LaTeX", but it doesn't seem to operate in the other direction.

Related questions:

Update: Just to clarify a little further: I want to distribute a package in another language that will accept any LaTeX document and produce HTML output (mostly of academic papers). I can't expect anything else to be installed (e.g. ghostscript, perl, latex2html, tth) on the machines already, and it needs to run cross platform. In other words, if I can find something that has compilable source code (or code in Java or C++) then I would rather go down that route so that the application is self contained. Alternatively, I will just use latex2html or tth and require the user to install those separately (although that's not ideal).

+2  A: 

I dont know of a native Java or C++ library to do this. But, if you're generating HTML anyway, you could always use JavaScript to convert the latex to html within the document.

jsMath is great at this:

http://www.math.union.edu/~dpvc/jsMath/

Inverse
I could be wrong, but does jsMath only parse equations? I need to parse entire LaTex documents including formating.
Shane
It parses Latex
Inverse
+1  A: 

Why don't you just run Latex, and convert the result (postscript? pdf) to HTML?

Ira Baxter
Is there a Java or C++ library to convert a postscript or pdf into HTML? This needs to run cross platform and can't depend on any dependencies.
Shane
Ghostscript (GNU) AFAIK is cross platform and does PS -> PDF. I assume that it must be straightforward to find PDF -> HTML.
Ira Baxter
Ghostscript would be a dependency, and I can't rely on it being installed. I found pdftohtml (http://sourceforge.net/projects/pdftohtml/) which is C++, but after testing, it doesn't handle complicated documents.
Shane
I don't understand the objection. Ghostcript would be dependency. But so would any other tool you suggest. So what? Furterh, you can't count on GhostScript being installed, so you suggest some other off-the-wall tool. But you can't count *that* being installed either! Whatever you commit to using, will be a dependency, and you'll have to make sure its installed.
Ira Baxter
Ghostscript is free and cross platform. Whats wrong with bundling ghostscript with whatever is your solution?
AndyL
A: 

As I see it, there are five fairly widely adopted tools for latex to html conversion (there are many more which are less actively used):

  • Latex2Hmtl is a set of perl scripts.
  • TtH is compiled and written in C.
  • Hevea is compiled and written in OCaml (with a GNU Library General Public License).
  • TeX4ht is compiled and written in C (with an LPPL license).
  • Another interesting looking option is plasTeX which is written in Python.

USENIX has a nice page showing how to use some of these.

So far, my best option seems to be TtH since I can readily compile the C source into my C++ application.

Shane
+7  A: 

Latex2html is the way to go. You say that you don't want any dependency, but any library you'll pick will be something you'll depend on. Latex2html:

  • works great,
  • it's part of TeX
  • it's relatively small that you can bundle the executable with your app
  • it's open source (GPL), so you might also try to link it within your program and not have an external dependency (you need to release with a GPL-compatible license, though)
  • support bibtex out of the box,
  • understand hyperlinks (if you convert from a postscript, you'll lose the hyperlinks)

I believe it compiles on all the major platforms (Linux, Windows, Mac) - but honestly I only have Linux so I can't say for sure.

Davide
Yes, but if I can use an API or raw source code, then I don't need to install any separate components. I'll test Latex2html and see how well it works; I've only used TtH before which also works well. Also, I can't find the Latex2html source code anywhere. Any ideas?
Shane
Ok, if the library is not huge and you redistribute it with your code. But you can do so with latex2html too (both as source code or binary). In the past I developed a java application which called a small external binary (written in C by a colleague). We bundled the distribution of both (of course it didn't work on all java platforms, but only on those for which he provided the binary). It was pretty easy, and served well our purposes.
Davide
Btw, latex2html is written in perl. No need to compile it...
Gonzalo
@Shane: get it from the links in the right column of http://packages.debian.org/sid/latex2html (you probably want the orig.tar.gz one)
Gonzalo
+1  A: 

I use LyX as a frontend to latex, which makes editing a lot more convenient, and sort of produces its own flavor of latex. The upside is that for LyX, there is a separate html export, which uses all the extra-information present in LyX. The tool is called eLyxer.

The homepage states:

There are some tools for TeX -> HTML conversion … but the results tend to be poor and rigid. eLyXer is meant to produce acceptable-to-beautiful HTML code, depending on your browser's Unicode and CSS rendering merits.

I can't really compare the output of elyxer with the tex2html tools, but I can confirm that elyxer produces clean, beautiful html code that probably does what you want. If you're willing to give LyX a shot :)

nes1983
+1 Thanks. That's really neat. I'll definitely check it out. I can't expect all my users to be using LyX as a front end, however, so it isn't really a feasible solution.
Shane