views:

4068

answers:

6

Scenario:
I have a document I created using LaTeX (my resume in this case), it's compiling in pdflatex correctly and outputting exactly what I'd like. Now I need the same document to be converted to plain old ASCII.

Example:
I have seen this done (at least once) here, where the author has a PDF version and an ASCII version that matches the PDF version in almost every way, including margins, spacing and bullet points.

I realize this type of conversion cannot be exact due to limitations in the ASCII format, but a very close approximation does seem possible based on what I have found so far. What is the process for doing this?

+4  A: 

CatDVI can convert DVI to text and attempts to preserve the formatting.

Bearddo
Do you know how to turn off "justified" alignment?
chuckg
I sure don't, sorry.
Bearddo
Try piping it through fmt(1) with the `-u` option.
Cirno de Bergerac
Just remove the excess spacing, e.g. like this `catdvi foo.dvi | perl -pe 's/[ ]+/ /g'` gives me more reasonable output than `fmt`
Frank
+7  A: 

You can try some of the proposed programs here:

TeX to ASCII

Diego Sevilla
+1  A: 

My usual strategy is to use hyperlatex to turn it into a web page, and then cope and paste from a web browser. I find that this gives the best formatting.

I usually then have to go through and manually fix some line-wrapping...

Brian Postow
I tried this out, but unfortunately it doesn't support using an external `cls` file. I'm using a class file to handle repetitive formatting tasks, along with the enumitem class. Thanks though!
chuckg
hmmm, I don't think I've had problems with that... but it's been a while since I've used it... and I don't have any of my files at work...
Brian Postow
+2  A: 

Another option is to use htlatex to create a web page from the LaTeX sources, then use links to convert to plain text. I used the command line

links -dump -no-numbering -no-references input.html > output.txt

in the past which gave a rather nice result. This will of course rather match the view of the rendered HTML than the original PDF, thus maybe not exactly what you want.

bluebrother
A: 

you can import into lyx and use lyx's export to text feature.

kind of silly if you don't use lyx but if you already have it, very quick and easy solution. Good result for me, although to be fair my files are pretty simple. Not sure how more elaborate files get converted.

DDD
A: 

Try the steps here: http://zanedp.livejournal.com/201222.html

Here is a sequence that converts my LaTeX file to plain text:

$ latex file.tex
$ catdvi -e 1 -U file.dvi | sed -re "s/\[U\+2022\]/*/g" | sed -re "s/([^^[:space:]])\s+/\1 /g" > file.txt

The -e 1 option to catdvi tells it to output ASCII. If you use 0 instead of 1, it will output Unicode. Unicode will include all the special characters like bullets, emdashes, and Greek letters. It also include ligatures for some letter combinations like "fi" and "fl." You may not like that. So, use -e 1 instead. Use the -U option to tell it to print out the unicode value for unknown characters so that you can easily find and replace them.

The second part of the command finds the string [U+2022] which is used to designate bullet characters (•) and replaces them with an asterisk (*).

The third part eats up all the extra whitespace catdvi threw in to make the text full-justified while preserving spaces at the start of lines (indentation).

After running these commands, you would be wise to search the .txt file for the string [U+ to make sure no Unicode characters that can't be mapped to ASCII were left behind and fix them.