views:

1003

answers:

5

Is there way to delete duplicate lines in a file in Unix?

I can to it with sort -u and uniq commands. but I want to use sed or awk. Is that possible?

A: 

There is a really convenient Gnome program: FSlint

Carlos Tasada
I think it finds duplicate files, not duplicate lines of file.
Michael Krelin - hacker
+4  A: 

From http://sed.sourceforge.net/sed1line.txt: (Please don't ask me how this works ;-) )

 # delete duplicate, consecutive lines from a file (emulates "uniq").
 # First line in a set of duplicate lines is kept, rest are deleted.
 sed '$!N; /^\(.*\)\n\1$/!P; D'

 # delete duplicate, nonconsecutive lines from a file. Beware not to
 # overflow the buffer size of the hold space, or else use GNU sed.
 sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
Andre Miller
hi Andre can you tell me how this works?
Vijay Sarathi
geekery;-) +1, but resource consumption is inavoidable.
Michael Krelin - hacker
'$!N; /^\(.*\)\n\1$/!P; D' means "If you're not at the last line, read in another line. Now look at what you have and if it ISN'T stuff followed by a newline and then the same stuff again, print out the stuff. Now delete the stuff (up to the newline)."
Beta
'G; s/\n/ /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' means, roughly, "Append the whole hold space this line, then if you see a duplicated line throw the whole thing out, otherwise copy the whole mess back into the hold space and print the first part (which is the line you just read."
Beta
+13  A: 
awk '!x[$0]++' file.txt
Jonas Elfström
Why set the field separator to colon?
Jonathan Leffler
I was going to suggest awk '!seen[$0] { seen[$0] = 1; print $0 }' but this is much better.
Pillsy
beautiful, but +1 to Jonathan Leffler's question.
Michael Krelin - hacker
I edited an old awk script that removed duplicates based on a field and forgot to remove the FS=":".
Jonas Elfström
heh, thought so ;-)
Michael Krelin - hacker
Cool. 987654321
DigitalRoss
A: 

sed is a stream editor and doesn't naturally treat input files as lines

It may be possible, but I think other techniques will be easier.

EDIT: well, Andre Miller's answer shows it's not, erm, intuitive

pavium