views:

200

answers:

4

I'm currently searching for an application or a script that does a correct word count for a LaTeX document.

Up till now, I have only encountered scripts that only work on a single file but what I want is a script that can safely ignore LaTeX keywords and also traverse linked files...ie follow \include and \input links to produce a correct word-count for the whole document.

With vim, I currently use ggVGg CTRL+G but obviously that shows the count for the current file and does not ignore LaTeX keywords.

Does anyone know of any script (or application) for Linux that can do this job?

+1  A: 
latex file.tex
dvips -o - file.dvi | ps2ascii | wc -w

should give you a fairly accurate word count.

aioobe
+1  A: 

I went with icio's comment and did a word-count on the pdf itself by piping the output of pdftotext to wc:

pdftotext file.pdf - | wc - w 
Andreas Grech
Be careful with this. I believe a word that is hyphenated across two lines will show up as 2 words, not one. Headers and footers will also be counted. Look at the output from `pdftotext` and see if it is okay for you. If you want an exact count, I would not use this solution.
Geoff
A: 

I use the following VIM script:

function! WC()
    let filename = expand("%")
    let cmd = "detex " . filename . " | wc -w | perl -pe 'chomp; s/ +//;'"
    let result = system(cmd)
    echo result . " words"
endfunction

… but it doesn’t follow links. This would basically entail parsing the TeX file to get all linked files, wouldn’t it?

The advantage over the other answers is that it doesn’t have to produce an output file (PDF or PS) to compute the word count so it’s potentially (depending on usage) much more efficient.

Although icio’s comment is theoretically correct, I found that the above method gives quite accurate estimates for the number of words. For most texts, it’s well within the 5% margin that is used in many assignments.

Konrad Rudolph
Cheers for the script but following links is a must for me since my document is pretty much structured with `\include`s
Andreas Grech
+2  A: 

I use texcount. The webpage has a Perl script to download (and a manual).

It will include tex files that are included (\input or \include) in the document (see -inc), supports macros, and has many other nice features.

When following included files you will get detail about each separate file as well as a total. For example here is the total output for a 12 page document of mine:

TOTAL COUNT
Files: 20
Words in text: 4188
Words in headers: 26
Words in float captions: 404
Number of headers: 12
Number of floats: 7
Number of math inlines: 85
Number of math displayed: 19

If you're only interested in the total, use the -total argument.

Geoff
But does it follow links to `\include` and `\input` files?
Andreas Grech
Yes, that's what the `-inc` parameter does (I'll edit my response).
Geoff
Brilliant. Just tested out this script and it works great! Cheers Geoff
Andreas Grech
Cool. I haven't played with the macro support. If you have macros which produce text, you will need to look into that section.
Geoff