Just what title says. I am surprised by insufficiency of results in Google search for this question! What I want to is the equivalent of Python dictionaries but in bash (and hence, should work across OSX, Ubuntu and other major Linux distributions).
hput () {
eval hash"$1"='$2'
}
hget () {
eval echo '${hash'"$1"'#hash}'
}
hput France Paris
hput Netherlands Amsterdam
hput Spain Madrid
echo `hget France` and `hget Netherlands` and `hget Spain`
$ sh hash.sh
Paris and Amsterdam and Madrid
You can further modify the hput()/hget() interface so that you have named hashes as follows:
hput() {
eval "$1""$2"='$3'
}
hget() {
eval echo '${'"$1$2"'#hash}'
}
and then
hput capitols France Paris
hput capitols Netherlands Amsterdam
hput capitols Spain Madrid
echo `hget capitols France` and `hget capitols Netherlands` and `hget capitols Spain`
This lets you define other maps that don't conflict (e.g., 'rcapitols' which does country lookup by capitol city). But, either way, I think you'll find that this is all pretty terrible, performance-wise.
If you really want fast hash lookup, there's a terrible, terrible hack that actually works really well. It is this: write your key/values out to a temporary file, one-per line, then use 'grep "^$key"' to get them out, using pipes with cut or awk or sed or whatever to retrieve the values.
Like I said, it sounds terrible, and it sounds like it ought to be slow and do all sorts of unnecessary IO, but in practice it is very fast (disk cache is awesome, ain't it?), even for very large hash tables. You have to enforce key uniqueness yourself, etc. Even if you only have a few hundred entries, the output file/grep combo is going to be quite a bit faster - in my experience several times faster. It also eats less memory.
Here's one way to do it:
hinit() {
rm -f /tmp/hashmap.$1
}
hput() {
echo "$2 $3" >> /tmp/hashmap.$1
}
hget() {
grep "^$2 " /tmp/hashmap.$1 | awk '{ print $2 };'
}
hinit capitols
hput capitols France Paris
hput capitols Netherlands Amsterdam
hput capitols Spain Madrid
echo `hget capitols France` and `hget capitols Netherlands` and `hget capitols Spain`
To get a little more performance remember that grep has a stop function, to stop when it finds the nth match in this case n would be 1.
grep --max_count=1 ... or grep -m 1 ...
Prior to bash 4 there is no good way to use associative arrays in bash. Your best bet is to use an interpreted language that actually has support for such things, like awk. On the other hand, bash 4 does support them.
As for less good ways in bash 3, here is a reference than might help: http://mywiki.wooledge.org/BashFAQ/006
Bash 4 natively supports this feature. You declare an associative array by doing:
declare -A animals
You can fill it up with elements using the normal array assignment operator:
animals=( ["moo"]="cow" )
Or merge them:
declare -A animals=( ["moo"]="cow" )
Then use them just like normal arrays:
echo "${animals["moo"]}"
for sound in "${!animals[@]}"; do echo "$sound - ${animals["$sound"]}"; done
In bash 3, you don't have associative arrays. Do not use eval
to emulate them. You must avoid eval like the plague, because it is the plague of shell scripting.
First and foremost: Consider upgrading to bash 4. Seriously. The future is now, stop living in the past and suffering from it by forcing stupid broken and ugly hacks on your code.
If you have some silly excuse why you "can't upgrade", declare
is a far safer option. It does not evaluate data as bash code like eval
does, and as such it does not allow arbitrary code injection quite so easily.
Let's prepare the answer by introducing the concepts:
First, indirection (seriously; never use this unless you're mentally ill or have some other bad excuse for writing hacks).
$ animals_moo=cow; sound=moo; i="animals_$sound"; echo "${!i}"
cow
Secondly, declare
:
$ sound=moo; animal=cow; declare "animals_$sound=$animal"; echo "$animals_moo"
cow
Bring them together:
# Set a value:
declare "array_$index=$value"
# Get a value:
arrayGet() {
local array=$1 index=$2
local i="${array}_$index"
printf '%s' "${!i}"
}
Let's use it:
$ sound=moo
$ animal=cow
$ declare "animals_$sound=$animal"
$ arrayGet animals "$sound"
cow
Note: declare
cannot be put in a function. Any use of declare
inside a bash function turns the variable it creates local to the scope of that function, meaning we can't access or modify global arrays with it.
Once more. Upgrade to bash 4. If you can't, consider awk
before doing ugly hacks as described above. And definitely stay the heck away from eval
hackery.
Two things, you can use memory instead of /tmp in any kernel 2.6 by using /dev/shm (Redhat) other distros may vary. Also hget can be reimplemented using read as follows:
function hget {
while read key idx
do
if [ $key = $2 ]
then
echo $idx
return
fi
done < /dev/shm/hashmap.$1
}
In addition by assuming that all keys are unique, the return short circuits the read loop and prevents having to read through all entries. If your implementation can have duplicate keys, then simply leave out the return. This saves the expense of reading and forking both grep and awk. Using /dev/shm for both implementations yielded the following using time hget on a 3 entry hash searching for the last entry :
Grep/Awk:
hget() {
grep "^$2 " /dev/shm/hashmap.$1 | awk '{ print $2 };'
}
time echo $(hget FD oracle)
3
real 0m0.011s
user 0m0.002s
sys 0m0.013s
Read/echo:
time echo $(hget FD oracle)
3
real 0m0.004s
user 0m0.000s
sys 0m0.004s
on multiple invocations I never saw less then a 50% improvement. This can all be attributed to fork over head, due to the use of /dev/shm.
I really liked Al P's answer but wanted uniqueness enforced cheaply so I took it one step further - use a directory. There are some obvious limitations (directory file limits, invalid file names) but it should work for most cases.
hinit() {
rm -rf /tmp/hashmap.$1
mkdir -p /tmp/hashmap.$1
}
hput() {
printf "$3" > /tmp/hashmap.$1/$2
}
hget() {
cat /tmp/hashmap.$1/$2
}
hkeys() {
ls -1 /tmp/hashmap.$1
}
hinit ids
for (( i = 0; i < 10000; i++ )); do
hput ids "key$i" "value$i"
done
for (( i = 0; i < 10000; i++ )); do
printf '%s\n' $(hget ids "key$i") > /dev/null
done
It also performs a tad bit better in my tests.
$ time bash hash.sh
real 0m46.500s
user 0m16.767s
sys 0m51.473s
$ time bash dirhash.sh
real 0m35.875s
user 0m8.002s
sys 0m24.666s
Just thought I'd pitch in. Cheers!