tags:

views:

223

answers:

4

Hello,

I've been working on an app to create various document formats for a while now, and I've had limited success.

Ideally, I'd like to dynamically create a fairly simple ODT/PDF/DOC file. I've been focusing my efforts on ODT, because it is editable, and open enough that there are several tools which will convert it to any of the other formats I need.

The problem is that the ODT XML files are NOT simple, and there aren't any good-quality API's I could find (especially in python). So far, I've had the most success creating a template ODT file, and then manipulating the DOM in python as needed. This is ok generally, but is quickly becoming inadequate and requires too much tweaking every single time I need to alter one of the templates.

The requirements are:

1) Produce a simple document that will have lists, paragraphs, and the ability to draw simple graphics on the page (boxes, circles, etc...)

2) The ability to specify page size, and the different formats should generally print the exact same output when sent to a printer

My questions:

1) Are there any other ways I can produce ODT/PDF/DOC files?

2) Would LaTeX be acceptable? I've never really used it, does anyone have experience converting LaTeX files into other formats?

3) Would it be possible to use HTML? There are a lot of converters online. Technically you can specify dimensions in mm/cm, etc..., but I am worried that the printed output will differ between browsers/converters....

Any other ideas?

A: 

I suppose to be successful, you'd have to define how you want to input everything. Why don't you just use openoffice? it will save to ODT (duh...), PDF, and HTML (though it's not clean HTML, it's actually quite ugly).

In my recent experience, I've had success going from latex -> xhtml via LaTeXML (i had to compile from source). LaTeX is seeming more and more like a terminal format. It's great for PDF, but once you need some flexibility, it kind of fails. I should also note that there is no latex -> dvi in my workflow, so I can't comment on things like tex4ht that reads out of a dvi file (I have too many graphics that don't work with DVI to switch them now).

Shortly I'll be moving everything into docbook 4.5-- i like the docbook-utils package which supports latex, html, and i even saw a converter to ODT. But docbook is super-heavy on the markup, which is annoying, but it will provide me with the flexibility i need going forward.

Since you're using python, have you just considered using ReStructured Text?

I've also really enjoyed publishing from emacs' orgmode, which is a super light weight markup that goes into a bunch of different formats.

Mica
A: 

have you tried pandoc? i've been using it with good success for the conversion of different formats into each other. why try to invent the wheel twice?

Habi
A: 

Mica, where was this DocBook->ODT converter you saw? I need to do that too. (Pandoc doesn't appear to read DocBook Lite XML.)

Karl Fogel
A: 

Thanks, Habi. I did look at Pandoc, but its website says it doesn't read DocBook XML, it only writes it. It can "read markdown and (subsets of) reStructuredText, HTML, and LaTeX".

Now, I could try the route of doing DocBook->HTML and then HTML->ODT. I don't know how much information would be lost that way; it's worth a shot, anyway...

... okay, I just tried using their online converter at http://johnmacfarlane.net/pandoc/try with the full HTML text of the book in question, producingoss.com/en/producingoss.html (note this HTML is generated from DocBook XML masters). For dest format I chose "OpenDocument XML".

500 Internal Server Error timeout

Okay, maybe that was too big. Let me try something smaller:

producingoss.com/en/bug-tracker.html

...that got decent-looking OpenDocument XML output, only with the main body of the text replaced with "TRUNCATED! Please download pandoc if you want to convert large files." So I'll have to download and try it out.

Karl Fogel