I have a text file of about 20 million lines. Each line is 25 characters long. I estimate that there are probably about 200k-300k unique lines. What I want to find out is exactly how many unique lines there are, and how many occurrences of each line there are (I expect the result to be power-law-esque).
I could do this:
sort bigfile|uniq -c |sort -nr > uniqcounts
wc -l uniqcounts
but that is horribly inefficient memory and time-wise.
What is your best command line solution to this problem?