tags:

views:

113

answers:

4

Is there a nice bash one liner to map strings inside a file to a unique number?

For instance,

a
a
b
b
c
c

should be converted into

1
1
2
2
3
3

I am currently implementing it in C++ but a bash one-liner would be great.

+10  A: 
awk '{if (!($0 in ids)) ids[$0] = ++i; print ids[$0]}'

This maintains an associative array called ids. Each time it finds a new string it assigns it a monotically increasing id ++i.

Example:

jkugelman$ echo $'a\nb\nc\na\nb\nc' | awk '{if (!($0 in ids)) ids[$0] = ++i; print ids[$0]}'
1
2
3
1
2
3
John Kugelman
@John: Awesome... Thanks! Just finished my C++ version too :)
Legend
Nice one! You beat me to it *and* did it in one line. +1 for showing me that you don't have to initialize a variable in Awk.
larsmans
Come to think of it, in my solution I used the `seen` variable unitialized, without even thinking about it...
larsmans
+2  A: 
awk 'BEGIN { num = 0; }
{
    if ($0 in seen) {
        print seen[$0];
    } else {
        seen[$0] = ++num;
        print num;
    }
}' [file]

(Not exactly one line, ofcourse.)

larsmans
@larsman: Yeah true but that makes it more legible :) Thanks!
Legend
+3  A: 

The awk solutions here are fine, but here's the same approach in pure bash (>=4)

declare -A stringmap
counter=0
while read string < INPUTFILE; do
    if [[ -z ${stringmap[$string]} ]]; then
        let counter+=1
        stringmap[$string]=$counter
    fi
done
for string in "${!stringmap[@]}"; do
    printf "%d -> %s\n" "${stringmap[$string]}" "$string"
done
Daenyth
@Daenyth: +1 love it! :)
Legend
+2  A: 

slight modification without the if

awk '!($0 in ids){ids[$0]=++i}{print ids[$0]}' file
ghostdog74