views:

179

answers:

5

I know something like this has been asked before, but the answer was sort of side tracked.

I want to develop a hash function which will take a word and spit out an address of an array.

So, for example, if you input god:

  1. sort the word, d o g

  2. perform some sort of function on this to get an address d o g -> some number

  3. insert 'dog' into address some_number in array[].

I can't seem to make a function which doesn't get screwed up somehow.

  public static int hashCode(String word){
     char[] x = word.toCharArray();
     Arrays.sort(x);
     int hash = 0;
     for(int i =0; i<x.length; i++)
     {
        hash +=(x[i]-96)*(x[i]-96)*(x[i]-96)*(i+1)*(i+1)+i; 
     }
     hash %=size; // get a value that's inside the bounds of the array
     if(hash<0)
      hash = hash + size;

     return (hash); 
    }

This is my current algorithm but there are two problems.

  • the array size has the be huge so that there aren't a ton of collisions

  • there still are a few collisions, chair for example, produces: smarminess, parr, chair

What do you guys think? I really appreciate your help

A: 

There is a lot of research on hash functions and collision resolution. Here's a place to start: Hash Table

Uh Clem
Well the thing is, the collisions can't be chained because they're so different. I do chain collisions which should be chained (dog is chained with god).
kodai
I think you misunderstand how the chaining works. The two don't have to be related. They just share the same bucket, that's all.
Kevin Bourrillion
Yep. Chaining works regardless of the hash function that you use.
Uh Clem
+1  A: 

Your hash function looks totally arbitrary. Why are you using that?

There are a few common, well known and relatively good hash functions, see a description here:

http://www.azillionmonkeys.com/qed/hash.html

See also http://stackoverflow.com/questions/263400#263416

Vinko Vrsalovic
Thank you for your help! My hash function is sort of arbitrary, but it's a product of trial and error. x^3 * i^2 + i the x^3 just makes it quadratic (other wise the integers are very small and become overshadowed). The i*2 is the weight. It weights each letter. The i helps maintain the length.
kodai
A: 

I guess that -- from your title and from the Arrays.sort(x) function -- that you're looking for a hash function that expressly collides when two strings are anagrams of each other. Is this correct? If so, you should specify that requirement INSIDE the question.

The article that Vinko suggested is good. I also recommend Integer Hash Function for other algorithms that you might try.

Good luck!

Chip Uni
A: 

If you you really want to develop a "hash" that deliberately collides for all anagrams (in other words one that's amenable to finding anagrams in a hash table) then why not split the string into an array of characters, filter out any characters you want to ignore (non-letters) and sort the results, concatenate and then hash that string.

Thus "dog" and "god" both get munged into "dgo" and that's your key for all anagrams of "dog."

In modern versions of Python all that verbiage can be summarized in the following one-line function:

def anagrash(s):
    return ''.join(sorted([x for x in s.lower() if s.isalpha()]))

... which you might use as:

anagrams = dict()
for each in phrases:
    ahash = anagrash(each)
    if ahash not in anagrams:
        anagrams[ahash] = list()
    anagrams[ahash].append(each)

... to build a dictionary of possible anagrams from a list of phrases.

Then, to filter out all of the phrases for which no anagram was found:

for key,val in anagrams:
    if len(val) < 2:
        del anagrams[key]

So, there's your homework assignment. Less than a dozen lines of Python. Porting that to whatever language your instructor is teaching and wrapping it in logic to read in the phrases and write out results is all left as an exercise to the student.

Jim Dennis
A: 

Thank you, everybody, for your help! I really appreciate it.

Uh Clem was correct, I didn't really understand the meaning of collisions, I thought they were supposed to be intentional, and I thought hash addresses were supposed to be absolute, but it seems that they're used as a pointer to a very small subset, not the element themselves.

So instead of a hash function giving you an exact house address, it gives you 3 or 4 houses it could be and you just have to search through those through. The extra houses being the collisions.

So thank you everybody for your help, you're a fantastic bunch.

kodai