ansaurus

Question

How to count number of unique values of a field in a tab-delimited text file?

Answer 1

+7 A:

You can make use of cut, sort and uniq commands as follows:

cat input_file | cut -f 1 | sort | uniq

gets unique values in field 1, replacing 1 by 2 will give you unique values in field 2.

Avoiding UUOC :)

cut -f 1 input_file | sort | uniq

EDIT:

To count the number of unique occurences you can make use of wc command in the chain as:

cut -f 1 input_file | sort | uniq | wc -l

codaddict 2010-08-17 12:09:39

Useless use of `cat` award for the day :-)

Douglas Leeder 2010-08-17 12:14:44

@Douglas: Award accepted :)

codaddict 2010-08-17 12:16:57

you can also use `sort -u` instead of `sort | uniq`

Hasturkun 2010-08-17 12:42:33

`uniq -c` will give the counts per item - `wc -l` will count the total number of items.

Dennis Williamson 2010-08-17 13:14:56

Answer 2

A:

AWK is your friend. You can write simple one-off programs for this kind of thing on the command line. Read its manual (man gawk) and you'll be enlightened.

Ian 2010-08-17 12:11:21

Answer 3

+3 A:

You can use awk, sort & uniq to do this, for example to list all the unique values in the first column

awk < test.txt '{print $1}' | sort | uniq

As posted elsewhere, if you want to count the number of instances of something you can pipe the unique list into wc -l

Jon Freedman 2010-08-17 12:11:53

Answer 4

A:

Assuming the data file is actually Tab separated, not space aligned:

<test.tsv awk '{print $4}' | sort | uniq

Where $4 will be:

$1 - Red
$2 - Ball
$3 - 1
$4 - Sold

Douglas Leeder 2010-08-17 12:13:43

Answer 5

+1 A:

# COLUMN is integer column number
# INPUT_FILE is input file name

cut -f ${COLUMN} < ${INPUT_FILE} | sort -u | wc -l

stacker 2010-08-17 12:18:04

ansaurus

tags:

views:

answers:

How to count number of unique values of a field in a tab-delimited text file?

related questions