views:

39

answers:

2

In the local object there is a collate facet.

The collate facet has a hash method that returns a long.
http://www.cplusplus.com/reference/std/locale/collate/hash/

Two questions:

  • Does anybody know what hashing method is used.
  • I need a 32bit value.
    If my long is longer than 32 bits, does anybody know about techniques for folding the hash into a shorter version. I can see that if done incorrectly that folding could generate lots of clashes (and though I can cope with clashes as I need to take that into account anyway, I would prefer if they were minimized).

Note: I can't use C++0x features
Boost may be OK.

+2  A: 

No, nobody really knows -- it can vary from one implementation to another. The primary requirements are (N3092, §20.8.15):

For all object types Key for which there exists a specialization hash, the instantiation hash shall:

  1. satisfy the Hash requirements (20.2.4), with Key as the function call argument type, the DefaultConstructible requirements (33), the CopyAssignable requirements (37),
  2. be swappable (20.2.2) for lvalues,
  3. provide two nested types result_type and argument_type which shall be synonyms for size_t and Key, respectively,
  4. satisfy the requirement that if k1 == k2 is true, h(k1) == h(k2) is also true, where h is an object of type hash and k1 and k2 are objects of type Key.

and (N3092, §20.2.4):

A type H meets the Hash requirements if:

  1. it is a function object type (20.8),
  2. it satisifes the requirements of CopyConstructible and Destructible (20.2.1),
  3. the expressions shown in the following table are valid and have the indicated semantics, and
  4. it satisfies all other requirements in this subclause.

§20.8.15 covers the requirements on the result of hashing, §20.2.4 on the hash itself. As you can see, however, both are pretty general. The table that's mentioned basically covers three more requirements:

  1. A hash function must be "pure" (i.e., the result depends only on the input, not any context, history, etc.)
  2. The function must not modify the argument that's passed to it, and
  3. It must not throw any exceptions.

Exact algorithms definitely are not specified though -- and despite the length, most of the requirements above are really just stating requirements that (at least to me) seem pretty obvious. In short, the implementation is free to implement hashing nearly any way it wants to.

Jerry Coffin
A: 

If the implementation uses a reasonable hash function, there should be no bits in the hash value that have any special correlation with the input. So if the hash function gives you 64 "random" bits, but you only want 32 of them, you can just take the first/last/... 32 bits of the value as you please. Which ones you take doesn't matter since every bit is as random as the next one (that's what makes a good hash function).

So the simplest and yet completely reasonable way to get a 32-bit hash value would be:

int32_t value = hash(...);

(Of course this collapses groups of 4 billion values down to one, which looks like a lot, but that can't be avoided if there are four billion times as many source values as target values.)

sth
If long was 64. would xor of the high and low be an acceptable method?
Martin York
@Martin: It would, but if it's a good hash function than it wouldn't make it any more or less random than just taking the high or low 32 bits by themselves. If it isn't a good hash function the xor could make it better or worse, depending on *how* that hash function isn't good. A good hash function tries to achieve "maximal randomness" of all the bits. For the 64-bit value to be "random" also its 32-bit parts have to be "random". Xoring them together won't hurt, but the single parts should already be as "random" as it gets.
sth