I've got a csv file where the left most column contains an ID field. Is there a clever way I can use any utility programs such as sed to find any ID's that are used more than once?
+3
A:
If all you want is the IDs, then you can try
cut -d "," -f $NUM | sort -n | uniq -d
where $NUM
is the number of the field containing the ID. The cut
command will extract a list of ids, and the uniq
command will show you only those which are duplicated.
Michael Mior
2010-09-23 23:51:35
Note that you need to `sort` the items before passing them to `uniq`; `uniq` only compares adjacent lines.
Brian Campbell
2010-09-24 00:02:15
Oh, right. Thanks @Brian. Updated answer.
Michael Mior
2010-09-24 00:07:10
You can add `-c` to `uniq` (or substitute it for `-d`) to get a count of the duplicates.
Dennis Williamson
2010-09-24 02:13:37
Note uniq can't currently operate on a particular field, so I would remove the -f $NUM portion from the cut command, and just add the note that if all you want is the IDs, then you can add that.
pixelbeat
2010-09-24 13:46:53
@pixelbeat I don't understand your comment. I realize `uniq` can't operate on a particular field, which is I why suggested `cut`ting things up first. If you remove the `cut` command, uniq won't look for unique IDs, but whole rows which are unique, which I don't think is what the OP wants.
Michael Mior
2010-09-24 15:52:00