ansaurus

Question

To find repeated matches by `uniq -d`

Answer 1

+1 A:

Try this to double check, it will output any lines which are duplicated:

  cat /tmp/1 |  awk 'seen[$0]++ == 1'

Oh, this is your problem:

 cat /tmp/1 | sort | uniq -d

Sort it before running uniq!

Sean A.O. Harney 2009-07-25 11:25:09

no need to use cat.

ghostdog74 2009-07-25 11:34:38

Lines 2 and 7 of Masi's sample file are the same. But they're not on consecutive lines, which appears to be the heart of the misunderstanding.

dave 2009-07-25 11:35:57

ghostdog, well I am using cat because the OP did also. Yes I am aware I could use shell redirection instead, or give as a command line arg to awk or sort.dave, thanks. Didn't see that one! edited.

Sean A.O. Harney 2009-07-25 11:54:04

Answer 2

+3 A:

You have to sort your data before you use uniq. It only removes/detects duplicates on adjacent lines.

dave 2009-07-25 11:33:18

Or use an awk script to do the job properly?

Douglas Leeder 2009-07-25 11:49:17

Thank you for pointing that out! --- It even says in the manual `The uniq utility reads the specified input_file comparing adjacent lines - -.`

Masi 2009-07-25 12:06:28

With my GNU coreutils uniq the manual says: Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).

Sean A.O. Harney 2009-07-25 14:52:40

Answer 3

A:

awk '{_[$0]++}END{for(i in _)if(_[i]>1) print i}' /tmp/1

or just

awk '_[$0]++ == 1' file

ghostdog74 2009-07-25 11:34:06

awk '_[$0]++' only works if there is at most one duplicate for each line with duplicates. If you had three rows that were the same, it would print out twice.

Sean A.O. Harney 2009-07-25 14:51:09

ansaurus

tags:

views:

answers:

To find repeated matches by `uniq -d`

related questions