Any idea how to take DVI files and turn them into tex?
I am pretty sure this is not possible. DVI contains informations about rendering the page and not which tex commands it has.
What you are asking is not possible. I think that (same as in PostScript) even recognizing words in a DVI files may require heuristics. A DVI file is a description of where to place individual letters on a piece of paper, and nothing more.
You can get partway there by either dvi2tty
, or by running dvips
followed by ps2ascii
, whichever gives the best results.
This is similar to the problem of turning PDF into XML which is referred to as "trying to turn a hamburger back into a cow". Both TeX->DVI and XML->PDF lose information, both in the structure of the document and its semantics.
It requires a great deal of heuristics and a large corpus to recreate (some of) the original document. It is never usually 100%. The text strings may be possible, the vectors are harder. Bitmaps are almost impossible.
Read Description of the DVI file format and write the programm. Result of your program will not be original text but it will be suitable.
Hey All, for whom ever finds this question again, or for all you who answered I found the best answer for me: what I was looking for is how indeed difficult, it's trying to figure out what could be an original tex that would compile to a given DVI (or pdf for that matter since i can turn the DVI into pdf easily). and InftyReader does it. it works prefect, i tried i a bunch of pdfs on it and then re-made them into pdfs and it was perfect!
Err, well, sort of.
The path of least resistance will involve, I think, a dvi->rtf convertor. I've posted a question: Q#1859373 dvi2rtf: who can convert DVI files to RTF. And there I post an untested implementation, which gives a bad solution that throws away all formatting.
With such a thing, then you could use word2007/8 and the excellent docx2tex utility to turn the rtf to tex.
The results would be unpleasant to read, but I can see some use cases for doing such.