views:

57

answers:

5

What I want to accomplish: print duplicated lines

This is what uniq man says:

SYNOPSIS

uniq [OPTION]... [INPUT [OUTPUT]]

DESCRIPTION

Discard all but one of successive identical lines from INPUT (or stan-
dard input), writing to OUTPUT (or standard output).

...

-d, --repeated
  only print duplicate lines

This is what I try to execute:

root@laptop:/var/www# cat file.tmp 
Foo
Bar
Foo
Baz
Qux
root@laptop:/var/www# cat file.tmp | uniq --repeated
root@laptop:/var/www# 

So I was waiting for Foo in this example but it returns nothing.. What is wrong with this snippet?

+3  A: 

uniq only checks consecutive lines against each other. So you can only expect to see something printed if there are two or more Foo lines in a row, for example.

If you want to get around that, sort the file first with sort.

$ sort file.tmp | uniq -d
Foo

If you really need to have all the non-consecutive duplicate lines printed in the order they occur in the file, you can use awk for that:

$ awk '{ if ($0 in lines) print $0; lines[$0]=1; }' file.tmp

but for a large file, that may be less efficient than sort and uniq. (May be - I haven't tried.)

David Zaslavsky
+2  A: 

cat file.tmp | sort | uniq --repeated

or

sort file.tmp | uniq --repeated

anthony
+1  A: 
cat file.tmp | sort | uniq --repeated

the lines needs to be sorted

pejuko
This should work, but the `cat` is completely unnecessary. `sort` is quite capable of reading its own input. David Z. has the correct idea.
Carl Smotricz
Yes, you are right, but if you use always cut you don't neet to remember what can read and what can't :-)
pejuko
Yeah, sometimes I like to have the `cat` in the beginning for consistency with other commands, but Carl's right that it is, strictly speaking, unnecessary. (@pejuko: there's also the option of shell input redirection, e.g. `sort < file.tmp | uniq -d`, even for commands that can't read files)
David Zaslavsky
@David: yep, I know this redirection and I'm using it with single commands. with multiple commands a I like the pipes. Well, I think we should stop there now ohterwise ther will be big flame about cut, pipes and redirections and what is the right religion.
pejuko
+1  A: 

uniq operates on adjacent lines. what you want is

cat file.tmp | sort | uniq --repeated

On OS X, I actually would have

sort file.tmp | uniq -d

Jamie Wong
-1 for invoking `cat` where it's utterly unnecessary. Either `sort file.tmp` (as you have) or `sort < file.tmp` do the same thing without creating an extra process.
Gabe
@Gabe - I was using the same format as OP. Also, you'll see I opted to not use cat in the second example
Jamie Wong
+1  A: 

I've never tried this myself, but I think the word "successive" is the key.

This would probably work if you sorted the input before running uniq over it.

Something like

sort file.tmp | uniq -d
Carl Smotricz