views:

291

answers:

6

I have to write my own hash function. If I wanted to just make the simple hash function that maps each letter in the string to a numerical value (i.e. a=1, b=2, c=3, ...), is there a way I can perform this hash on a string without having to first convert it to a c-string to look at each individual char? Is there a more efficient way of hashing strings?

+3  A: 

You can examine each individual char from a std::string using the [] operator. However, you can look at Boost::Functional/Hash for guidance on a better hashing scheme. There is also a list of hashing functions in c located here.

wheaties
so, my understanding is that hash functions map a string to an int, but usually these ints are mapped using a compression map to table addresses so that the hashtable is a more manageable size. is this applicable to the hash functions you recommended in the link?
zebraman
You mean buckets? There are a number of "usual" functions which are trade-offs in terms of size of the hash table produced and performance criteria. The biggest concern you should have is how many repeated values, that is, how uniformly distributed your results are. Poor hashing will invariably leave you with a small collection of linked lists rather than a constant amortized time look up table. I have not examined the later while I've seen Boost. Did I answer that?
wheaties
i think so. o_o
zebraman
A: 

You can make use of the member functions operator[] or at of the string class or iterators to access individual char of a string object without converting it to c-style char array.

To hash a string object to an integer you'll have to access each individual char of the string object which you can do as:

for (i=0; i < str.length(); i++) {
    // use str[i] or str.at(i) to access ith element.
}
codaddict
+1  A: 

xor the characters together, four at a time.

Stephen
i don't really understand what xor is/does. could you explain?
zebraman
xor is a bitwise operator meaning "one-but-not-both", the '^' operator in c++.e.g.0 ^ 1 => 11 ^ 1 => 03 ^ 1 => 2 (11 ^ 01 => 10)It'll give you a randomish integer value.Either way, you'll need to traverse the string in a way similar to Alex Martelli's solution. So go with that and you don't need to worry about word size. :)
Stephen
thanks for the help
zebraman
That's not a great hash function. For example, on ASCII data it won't touch the 8th, 16th, 24th or 32nd bits of the word at all. As a practical effect, if your hashtable has 512 buckets, then half of them would never be used by ASCII strings. You want to introduce some co-prime numbers somewhere along the line, and restricting the bucket count to compensate for a weakness in the hash just isn't necessary given the availability of better hashes that aren't much slower.
Steve Jessop
Fair point. I hadn't intended this to be a good hash function, just a simple hash function. There are plenty of better hashing algorithms described by the links in other answers. I had assumed (perhaps mistakenly) that hash<string> wasn't available and the question didn't really ask for performance or hash quality. I should have stated that explicitly.
Stephen
+2  A: 

Re the first question, sure, e.g, something like:

int hash = 0
int offset = 'a' - 1;
for(string::const_iterator it=s.begin(); it!=s.end(); ++it) {
  hash = hash << 1 | (*it - offset);
}

regarding the second, there are many better ways to hash strings. E.g., see here for a few C examples (easily translatable to C++ along the lines of the snippet above).

Alex Martelli
i see. how about if i wanted to do case-insensitive hashing. where A=a=1?
zebraman
+1, if only for use of `*2` and `|` to create a comedicaly poor hash ;-)
Steve Jessop
@zebraman, see http://www.cplusplus.com/reference/clibrary/cctype/tolower/
Alex Martelli
A: 

Here's a nice page about hash functions - "Hash functions" by Paul Hsieh (thanks Paul).

Nikolai N Fetissov
A: 
#include <iostream>
#include <string>
#include <algorithm>

using namespace std;

// a variation on dan bernstein's algorithm
// [http://www.cse.yorku.ca/~oz/hash.html]
template<typename Int>
struct hash {
    hash() : acc(5381) { }
    template<typename Ch>
    void operator()(Ch ch) { acc = ((acc << 5) + acc) ^ ch; }
    operator Int() const { return acc; }
    Int acc;
};

int main(int argc, char* argv[])
{
    string s("Hellp, world");
    cout << hex << showbase
        << for_each(s.begin(), s.end(), hash<unsigned long long>()) << '\n';
    return 0;
}
wilhelmtell