tags:

views:

520

answers:

7

Any idea how to take DVI files and turn them into tex?

+2  A: 

I am pretty sure this is not possible. DVI contains informations about rendering the page and not which tex commands it has.

Mouk
I think there should be no doubt that this is possible. The issue is can it be done well enough to be worthwhile.
Charles Stewart
+2  A: 

What you are asking is not possible. I think that (same as in PostScript) even recognizing words in a DVI files may require heuristics. A DVI file is a description of where to place individual letters on a piece of paper, and nothing more.

You can get partway there by either dvi2tty, or by running dvips followed by ps2ascii, whichever gives the best results.

Pascal Cuoq
+7  A: 

This is similar to the problem of turning PDF into XML which is referred to as "trying to turn a hamburger back into a cow". Both TeX->DVI and XML->PDF lose information, both in the structure of the document and its semantics.

It requires a great deal of heuristics and a large corpus to recreate (some of) the original document. It is never usually 100%. The text strings may be possible, the vectors are harder. Bitmaps are almost impossible.

peter.murray.rust
Oh, I really like the hamburger/cow picture! Very very descriptive.
Boldewyn
@Boldewyn I got it from Mike Kay (Saxon) but he got it from somewhere else I thinnk
peter.murray.rust
A: 

Read Description of the DVI file format and write the programm. Result of your program will not be original text but it will be suitable.

Alexey Malistov
+2  A: 

There's also catdvi, dvitype, and dvi2tty, available from ctan.

lhf
+1  A: 

Hey All, for whom ever finds this question again, or for all you who answered I found the best answer for me: what I was looking for is how indeed difficult, it's trying to figure out what could be an original tex that would compile to a given DVI (or pdf for that matter since i can turn the DVI into pdf easily). and InftyReader does it. it works prefect, i tried i a bunch of pdfs on it and then re-made them into pdfs and it was perfect!

jarer
Yes, good call! OCR systems tend not be smart about linebreaks though: have you looked at how it handles multi-line equations.
Charles Stewart
A: 

Err, well, sort of.

The path of least resistance will involve, I think, a dvi->rtf convertor. I've posted a question: Q#1859373 dvi2rtf: who can convert DVI files to RTF. And there I post an untested implementation, which gives a bad solution that throws away all formatting.

With such a thing, then you could use word2007/8 and the excellent docx2tex utility to turn the rtf to tex.

The results would be unpleasant to read, but I can see some use cases for doing such.

Charles Stewart