ansaurus

Question

Use shell utilities for equivalent of SQL group by on a CSV file

Answer 1

+3 A:

If all you want is the IDs, then you can try

cut -d "," -f $NUM | sort -n | uniq -d

where $NUM is the number of the field containing the ID. The cut command will extract a list of ids, and the uniq command will show you only those which are duplicated.

Michael Mior 2010-09-23 23:51:35

Note that you need to `sort` the items before passing them to `uniq`; `uniq` only compares adjacent lines.

Brian Campbell 2010-09-24 00:02:15

Oh, right. Thanks @Brian. Updated answer.

Michael Mior 2010-09-24 00:07:10

You can add `-c` to `uniq` (or substitute it for `-d`) to get a count of the duplicates.

Dennis Williamson 2010-09-24 02:13:37

Note uniq can't currently operate on a particular field, so I would remove the -f $NUM portion from the cut command, and just add the note that if all you want is the IDs, then you can add that.

pixelbeat 2010-09-24 13:46:53

@pixelbeat I don't understand your comment. I realize `uniq` can't operate on a particular field, which is I why suggested `cut`ting things up first. If you remove the `cut` command, uniq won't look for unique IDs, but whole rows which are unique, which I don't think is what the OP wants.

Michael Mior 2010-09-24 15:52:00

ansaurus

tags:

views:

answers:

Use shell utilities for equivalent of SQL group by on a CSV file

related questions