tags:

views:

20

answers:

1

I've got a csv file where the left most column contains an ID field. Is there a clever way I can use any utility programs such as sed to find any ID's that are used more than once?

+3  A: 

If all you want is the IDs, then you can try

cut -d "," -f $NUM | sort -n | uniq -d

where $NUM is the number of the field containing the ID. The cut command will extract a list of ids, and the uniq command will show you only those which are duplicated.

Michael Mior
Note that you need to `sort` the items before passing them to `uniq`; `uniq` only compares adjacent lines.
Brian Campbell
Oh, right. Thanks @Brian. Updated answer.
Michael Mior
You can add `-c` to `uniq` (or substitute it for `-d`) to get a count of the duplicates.
Dennis Williamson
Note uniq can't currently operate on a particular field, so I would remove the -f $NUM portion from the cut command, and just add the note that if all you want is the IDs, then you can add that.
pixelbeat
@pixelbeat I don't understand your comment. I realize `uniq` can't operate on a particular field, which is I why suggested `cut`​ting things up first. If you remove the `cut` command, uniq won't look for unique IDs, but whole rows which are unique, which I don't think is what the OP wants.
Michael Mior