ansaurus

Question

How can I delete duplicate lines in a file in Unix?

Answer 1

A:

There is a really convenient Gnome program: FSlint

Carlos Tasada 2009-09-18 13:02:56

I think it finds duplicate files, not duplicate lines of file.

Michael Krelin - hacker 2009-09-18 13:17:30

Answer 2

+4 A:

From http://sed.sourceforge.net/sed1line.txt: (Please don't ask me how this works ;-) )

 # delete duplicate, consecutive lines from a file (emulates "uniq").
 # First line in a set of duplicate lines is kept, rest are deleted.
 sed '$!N; /^\(.*\)\n\1$/!P; D'

 # delete duplicate, nonconsecutive lines from a file. Beware not to
 # overflow the buffer size of the hold space, or else use GNU sed.
 sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'

Andre Miller 2009-09-18 13:04:41

hi Andre can you tell me how this works?

Vijay Sarathi 2009-09-18 13:06:11

geekery;-) +1, but resource consumption is inavoidable.

Michael Krelin - hacker 2009-09-18 13:16:24

'$!N; /^$.*$\n\1$/!P; D' means "If you're not at the last line, read in another line. Now look at what you have and if it ISN'T stuff followed by a newline and then the same stuff again, print out the stuff. Now delete the stuff (up to the newline)."

Beta 2009-09-18 15:30:03

'G; s/\n/ /^$[ -~]*\n$.*\n\1/d; s/\n//; h; P' means, roughly, "Append the whole hold space this line, then if you see a duplicated line throw the whole thing out, otherwise copy the whole mess back into the hold space and print the first part (which is the line you just read."

Beta 2009-09-18 15:41:33

Answer 3

+13 A:

awk '!x[$0]++' file.txt

Jonas Elfström 2009-09-18 13:07:47

Why set the field separator to colon?

Jonathan Leffler 2009-09-18 13:17:36

I was going to suggest awk '!seen[$0] { seen[$0] = 1; print $0 }' but this is much better.

Pillsy 2009-09-18 13:17:44

beautiful, but +1 to Jonathan Leffler's question.

Michael Krelin - hacker 2009-09-18 13:20:09

I edited an old awk script that removed duplicates based on a field and forgot to remove the FS=":".

Jonas Elfström 2009-09-18 13:25:05

heh, thought so ;-)

Michael Krelin - hacker 2009-09-18 14:06:42

Cool. 987654321

DigitalRoss 2009-09-21 18:17:43

Answer 4

A:

sed is a stream editor and doesn't naturally treat input files as lines

It may be possible, but I think other techniques will be easier.

EDIT: well, Andre Miller's answer shows it's not, erm, intuitive

pavium 2009-09-18 13:07:49

ansaurus

tags:

views:

answers:

How can I delete duplicate lines in a file in Unix?

related questions