views:

173

answers:

3

is there a difference in the order of uniq and sort when calling them in a shell script? i’m talking here about time- and space-wise.

grep 'somePattern' | uniq | sort

vs.

grep 'somePattern' | sort | uniq

a quick test on a 140 k lines textfile showed a slight speed improvement (5.5 s vs 5.0 s) for the first method (get uniq values and then sort)

i don’t know how to measure memory usage though

the question now is: does the order make a difference? or is it dependent on the returned greplines (many/few duplicates)

i’m looking forward to your answers

+7  A: 

I believe that sort -u is suited to this exact scenario, and will both sort and uniquify things. Obviously, that'll be more efficient than calling sort and uniq individually in either order.

mquander
`sort -u` is a great hint, and no doubt, it’s more efficient than calling the two in either order. BUT, the order makes a difference (uniq | sort not working)
knittl
In a quick test, I found that `sort -u` is about 7% faster than `sort|uniq`.
Dennis Williamson
+5  A: 

The only correct order is to call uniq after sort, since the man page for uniq says:

Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).

Therefore it should be

grep 'somePattern' | sort | uniq
Robert Munteanu
thanks for clearing that up!
knittl
I've used | uniq | sort | uniq when grepping gigabytes worth of stuff out of sorted files just to try to keep the sort from having to sort an excessive amount of data.
Shin
+2  A: 

uniq depends on the items being sorted to remove duplicates(since it compares the previous and current item), hence why sort is always run before uniq. Try it and see.

Sven Schott
oh ok. that makes sense :) thanks
knittl