ansaurus

Question

Why use an array to implement a "list" instead of a hash table?

Answer 1

+1 A:

Because a list is typically ordered whereas a hashtable isn't. In the context where you add elements into a list and expect the ordering to remain consistent, a hashtable provides no guarantees as to the order you get back out while an array retains the ordering.

tvanfosson 2010-02-15 16:32:32

Thanks tvanfosson. I updated the question-- I realize I wasn't clear initially: I assume the keys in the table are just the integer indexes of the "list", meaning that the "order is preserved" if you access the objects by integer "index".

quixoto 2010-02-15 16:36:34

That's not true of any efficient hash table I've ever heard of.

bmargulies 2010-02-15 16:41:58

@bmargulies: Of course you're correct for an abstract enumeration through the hash buckets. But if all my keys represent indexes into a list, I can write a `for` loop over my known "key" range and get out the right values from the indexes I put them in at.

quixoto 2010-02-15 16:47:32

@Ben - Maybe I'm missing something but wouldn't you have to have a bucket per integer to make this happen? If you have more than one integer hash into the same bucket, you can't preserve order. If you do have one integer per bucket, how is that different than a sparsely populated array?

tvanfosson 2010-02-15 16:48:21

@tvanfosson: Let's say number of buckets and hash algorithm is implementation-specific. But assuming you have some `put(key, obj)` operation and a `get(key)` operation, you can iterate over your indexes using the get operation with increasing indexes for the key. (It could be that the answer is just "all the magic is in this implementation-specific part".)

quixoto 2010-02-15 16:58:04

Answer 2

A:

Because hash functions aren't free. Linear factors are important. Worst case times are important. Count the instructions.

For the particular case you cite, which is the underlying implementation of Javascript, there may be so much other overhead to wipe these issues out. Still, if someone tries to do something mathematical that really hits an array hard with simple numeric keys, an array has got to be better.

bmargulies 2010-02-15 16:45:47

Answer 3

+3 A:

An array is jsut a special case of a hash table where the hash function is very simple

f(x) := x;

and the modulo used is the same as the data word size (and thus the array size).

If you do not allow non unique values you do not need the "next" pointers and voila, we have an array.

Because of the absence of a complex hash function and modulo calculation, this is very fast, but only applicable when the array can be kept small (very large arrays with lot's of empty places waste memory resources and might trigger nasty things like swapping/trashing to disk).

Ritsaert Hornstra 2010-02-16 10:31:49

This is the correct answer.

slebetman 2010-02-16 16:53:57

Interesting way of looking at it. Thanks. OK, so basically, there are no further obvious distinctions beyond what I perceive-- really, a hashtable in this scenario is an array with somewhat more overhead and free sparseness, useful in a "does it fit your data/scenario better" kind of way.

quixoto 2010-02-17 03:30:58

Answer 4

+1 A:

When you look at it from the angle of someone who wants to implement the pseudo array behaviour of Javascript, you are right that a hashtable is the better way to do this, esp. since Javascript arrays have no fixed length and need to be able to accomodate entries at any index. Arrays in Javascript just look like arrays but behave more like hashtables.

But in a language that is a bit closer to the machine the performance and space benefits of using a real array for data that can be stored efficiently in an array are quite noteworthy, especially since the benefits of using hashtables for this are pretty limited to sparse arrays which is not what you would or should use an array for. This is actually better done with hashtables that have integer keys.

Insertion, lookup, and removal is also O(1) for arrays in all cases but has a much lower constant O than hashtables (it's not just because of cache locality). And arrays need less space per entry. If you would like to remove and insert entries in a way that the following entries change their index accordingly, it would be O(n) with n beeing the number of entries that need to be moved, but it would also be O(n) for hashtables to do this and again with a much higher constant overhead. This is the kind of operation for which you better use a linked list. Also growing an array is less costly than growing a hashtable that might need to rehash all of it's entries.

All of the different collection types have their specific advantages and disadvantages. That's why there are so many of them in use.

x4u 2010-02-16 16:37:08

ansaurus

tags:

views:

answers:

Why use an array to implement a "list" instead of a hash table?

related questions