views:

264

answers:

3

A volunteer job requires us to convert a large number of LaTeX documents into ePub format. It's a series of open-source fiction book which has so far only been produced only on paper via a print on demand service. We'd like to be able to offer the book to users of book-reader devices (such as Kindle) which require the ePub format for best results.

Fortunately, ePub is a very simple format, however there's no trivial way for LaTeX to produce the XHTML outut required.

We experimented with alternative LaTeX compilers (e.g. plastex) but in the end we figured that it would probably be a lot easier to simply write our own compiler which understands a tiny subset of the LaTeX language and compiles directly to XHTML / ePub.

Previously I used a tool on Windows called GOLD. This allowed me to go directly from BNF grammars to a stub parser. It also alllowed me to implement the parser in any language I liked. (I'd choose Python).

This product has to work on Linux, so I'm wondering if there's an equivalent toolchain that works as well under Ubutnu / Eclipse / Python. The idea is that we will take the grammar of TeX and just implement a teeny subset of that, but we do not want to spend a huge amount of time worrying about grammar and parsing. A parser generator would obviously save us a great deal of time.

Sal


UPDATE 1: Bonus marks for a solution with excellent documentation or tutorials.


UPDATE 2: Extra bonus if there is grammar file for TeX already available, since all I'd have to do is implement the functions we care about.

+2  A: 

Try PLY.

Marcelo Cantos
+2  A: 

I once used tex4ht to convert LaTeX to XHTML+MathML. Worked quite nice. From that on, you could use the output HTML as base for the ePub.

Of course, this breaks the Python toolchain, so it might not become your favorite method...

Boldewyn
That's not a problem. We have Python and Tex people int the team of volounteers. The only issue is that we want to build on stable tools.
Salim Fadhley
+3  A: 

Try pyparsing.

Se http://pyparsing.wikispaces.com/WhosUsingPyparsing, search for TeX. There's a project where pyparsing is used to parse a subset of TeX syntax mentioned on that page.

For documentation, I recommend the "Getting started with pyparsing" e-book, by pyparsing's author.

codeape