views:

165

answers:

5

Is it possible using a certain hash function and method (the division method, or double hashing) to make a chained hash table that can be resized without having to reinsert (rehash) each element already in the table?

+1  A: 

Probably not - the hash would have to not use any variety of modulus, which would mean that it would have a required table size depending on the data anyway.

Walt W
A: 

All hash tables must deal with collisions, either through chaining or probing or whatever, so, I suspect that if upon table resize you simply resized the table (IE, you don't re-insert everything), you would have a functional, though highly non optimal, hash table.

dicroce
Changing the table size will in the general case change the bucket an item is hashed to. Hence hashing an item that is in the hash table will probably return a different bucket and you falsly conclude that the item is not in the hash table. You would have to store the size of the table for every item at the moment an item is inserted in order to be able to reproduce the bucket the item was hashed to when the item is rehashed. But how can you find this value when you hash an item? You could only perform a linear search.
Daniel Brückner
+1  A: 

I can only assume the reason you want to avoid rehashing everything is that the resulting high latency operation is not an issue to throughput but is instead a problem for responsiveness (either human or in SLA sense)

In theory you could use a modified closed addressing hash table like so:

  1. remember all previous sizes where elements were added
  2. On resize keep the old buckets around linked to internally via a map of sizeWhenUsed -> buckets (obviously if the buckets are empty no need to bother)
  3. Invariant a mapping of Key k exists in only one of the 'internal hash tables' at any time.
  4. on addition of a value you must first look it up in all the other maps to determine if the entry already exists and is mapped. If it is remove it from the old one and add it to the new one.
    • if an internal map becomes empty/below a certain size it should be deleted and remaining elements moved into the current hash table.
  5. so long as the number of internal hashes is kept constant this will not impact the big O behaviour of the data structure in time, though it will in memory.
    • This will however affect the actual performance as X additional checks must be made where X is the number of old hashes maintained.
    • If the wasted space of the list of buckets (the buckets themselves will be null if empty so are zero cost unless populated) becomes significant (use a fudge factor for this) then at some point on a rehash you may have to take the hit of moving things into the current table unless you are willing to expend essentially unlimited memory.

Downgrades in size of the hash will only function in the desired manner (releasing memory) if you are willing to rehash. This is unavoidable.

It is possible you could make use of some complex additional data within an open addressing scheme to 'flag' which of the internal hashes the cell was in use by but removals would be extremely complex to get right and would be very expensive unless you just left them as wasted space. I would never attempt this.

I would not suggest using the former method either unless the underlying data spent very little time in the hash, thus the related churn would tend to steadily 'erase' the older sized hashes. It is likely that a hash tuned for just this sort of behaviour and preset with an appropriate size would perform much better though.

Since the above scheme is simply trading wasted memory and throughput for reduction in the expensive operations with speculative (at best) chance of reducing this waste I would suggest simply pre-sizing your hash to be larger than required and thus never resized would be a more sensible option.

ShuggyCoUk
+2  A: 

You would still need to reinsert, but some way to make that cheaper would be to store the hash value before the modulus was applied. That way, you can save a large part of the calculation cost of rehashing.

With this approach, it would be possible to shrink the table in size as well.

Thorarin
Very clever if your hash algo's time consuming
Walt W
A: 

I assume you're asking this question because you want to avoid the high cost of resizing a hash table. You want a hash table which has guaranteed constant time (assuming no collision problems, of course). This can be done.

The trick is to iteratively initialize the next-size hash table while the current one is filling up. By the time you need it, it's ready.

Quick pseudo-code to add an element:

if resizing then
    smallTable = bigTable
    bigTable = new T[smallTable.length * 2] //if allocation zeroes memory, we lose O(1)
    set state to zeroing
elseif zeroing then
    zero a small amount of the bigTable memory
    if done zeroing then set state to transfering
elseif transfering then
    transfer a few values in the small table to the big table
    if done transfering then set state to resizing
end if

add new item to small array
add new item to large array
Strilanc