I used a Godel-inspired scheme:
Assign the primes P_1 to P_26 to the letters (in any order, but to obtain smallish hash values best to give common letters small primes).
Built a histogram of the letters in the word.
Then the hash value is the product of each letter's associated prime raised to the power of its frequency. This gives a unique value to every anagram.
Python code:
primes = [2, 41, 37, 47, 3, 67, 71, 23, 5, 101, 61, 17, 19, 13, 31, 43, 97, 29, 11, 7, 73, 83, 79, 89, 59, 53]
def get_frequency_map(word):
map = {}
for letter in word:
map[letter] = map.get(letter, 0) + 1
return map
def hash(word):
map = get_frequency_map(word)
product = 1
for letter in map.iterkeys():
product = product * primes[ord(letter)-97] ** map.get(letter, 0)
return product
This cleverly transforms the tricky problem of finding subanagrams into the (also known to be tricky) problem of factoring large numbers...