Is there a nice bash one liner to map strings inside a file to a unique number?
For instance,
a
a
b
b
c
c
should be converted into
1
1
2
2
3
3
I am currently implementing it in C++ but a bash one-liner would be great.
Is there a nice bash one liner to map strings inside a file to a unique number?
For instance,
a
a
b
b
c
c
should be converted into
1
1
2
2
3
3
I am currently implementing it in C++ but a bash one-liner would be great.
awk '{if (!($0 in ids)) ids[$0] = ++i; print ids[$0]}'
This maintains an associative array called ids
. Each time it finds a new string it assigns it a monotically increasing id ++i
.
Example:
jkugelman$ echo $'a\nb\nc\na\nb\nc' | awk '{if (!($0 in ids)) ids[$0] = ++i; print ids[$0]}'
1
2
3
1
2
3
awk 'BEGIN { num = 0; }
{
if ($0 in seen) {
print seen[$0];
} else {
seen[$0] = ++num;
print num;
}
}' [file]
(Not exactly one line, ofcourse.)
The awk solutions here are fine, but here's the same approach in pure bash (>=4)
declare -A stringmap
counter=0
while read string < INPUTFILE; do
if [[ -z ${stringmap[$string]} ]]; then
let counter+=1
stringmap[$string]=$counter
fi
done
for string in "${!stringmap[@]}"; do
printf "%d -> %s\n" "${stringmap[$string]}" "$string"
done
slight modification without the if
awk '!($0 in ids){ids[$0]=++i}{print ids[$0]}' file