ansaurus

Question

Partitioned Running Total in a Text File

Answer 1

+3 A:

If all the lines have identical formatting (including whitespace), something as simple as this will do the trick:

awk '{print $0, ++a[$0]}'

I am not sure about the performance on GB size files, since this will need to store a number per unique row. If your data is sorted you can get away with a fixed amount of storage:

awk '{ if ($0 != prev) n = 0; print $0, ++n; prev = $0 }'

schot 2010-08-17 08:21:47

Thanks for this! Not all the lines have identical formatting. Can I switch the ++a[$0] to something that will match on the first two columns only? ++a[$1$2]?

M. Roessler 2010-08-17 08:39:54

@M. Roessler Yes, that will work fine.

schot 2010-08-17 08:55:33

ansaurus

tags:

views:

answers:

Partitioned Running Total in a Text File

related questions