This can be in any high-level language that is likely to be available on a typical unix-like system (Python, Perl, awk, standard unix utils {sort, uniq}, etc). Hopefully it's fast enough to report the total number of unique terms for a 2MB text file.
I only need this for quick sanity-checking, so it doesn't need to be well-engineered.
Remember, case-insensitve.
Thank you guys very much.
Side note: If you use Python, please don't use version 3-only code. The system I'm running it on only has 2.4.4.