tags:

views:

142

answers:

2

I want to implement a hash table in the following manner:

struct list
{ 
 char *string;
 struct list *next; 
};

struct hash_table
{
 int size;     /* the size of the table */
 struct list **table; /* the table elements */
}; 

Instead of struct hash_table like above, can I use:

struct hash_table
{
 int size;              /* the size of the table */
 struct list   *table; /* the table elements */
}; 

That is, can I just use a single pointer instead of a double pointer for the hash table elements? If yes, please explain the difference in the way the elements will be stored in the table?

+1  A: 

By using the double pointer, **table, once memory is allocated you can reference an array of elements, eg: table[2]. You then could access a hash table element directly, without having to traverse a linked list.

Alternatively, if you use only a single pointer, *table, you will only be able to reference a single element. So basically you would then need to use a data structure such as a linked list (ick) to store your hash table elements - just like how you have set up your list data structure.

I do not recommend using a single pointer, since you will have to do a linear list traversal in order to perform any operations on your hash table. In other words, to add an element you will have to traverse the linked list to find element n instead of just directly accessing it. Since operations on a hashtable are expected to be O(1) fast, this implementation eliminates the advantage of using a hashtable. You might as well just use a linked list at that point.

Justin Ethier
That is incorrect. Operations on hash tables are expected to be O(1).
Michael Aaron Safyan
@Justin, also both chained and open-addressing are valid implementations of hash tables, and for arbitrary hash functions, chaining typically works better. With a good hash function, though, open addressing is a better choice. That said, open addressing is significantly more complicated to implement, and both give you O(1) access in the average case.
Michael Aaron Safyan
Also, he could insert near the front of the list (between table[i] and table[i]->next), so the linear traversal argument is incorrect.
Michael Aaron Safyan
+3  A: 

Well, that depends. If you look at table[i], how will you know if it is empty or not? If you use list**, then table[i] is type list*, and so you can easily tell if it is empty based on if it is null. If you use type list*, then table[i] is a list, and so unless you use a null, empty, or some other value for the key as a sentinel value indicating that the list is empty, then it will not work. So, yes, you could use list*, but then you need to add an additional sentinel condition, which might also limit which types of keys you allow. Alternatively, you could just ignore the first element of list[i], however, that would be wasteful. I should also point out that using list* instead of list** makes inserting an element at the front of table[i] harder; if you use a list** type, then you simply need to set the new entry's next pointer to the current value of table[i], and then assign to table[i] the address of the newly allocated entry. If you use type list*, you will need to insert the item between table[i] and table[i]->next, which makes the logic for insertion unnecessarily complicated.

Also, I should add that your hash table definition is flawed. A hash table is supposed to map one set of items to another. Your list structure has a single value. It needs both a key and a value. A better declaration for a hash table would be the following:

typedef struct HashTableEntry
{
    char* key;
    void* value;
    struct HashTableEntry* next;
} HashTableEntry;

typedef struct HashTable
{
     HashTableEntry** entries;
     int capacity; // size of entries
     int length; // number of key/value pairs currently in map
} HashTable;
Michael Aaron Safyan
+1 the only thing I do differently is that I like to store the calculated hash in the entry to avoid having to call the hash function again if I need it.
Nick
@Nick, yeah, that's also a good idea, but it makes sense to make that its own struct, put the hash and the HashTableEntry* in that struct, and then make an array of such structs; otherwise, you end up paying for the hash code multiple times, since each element for any given list will have the same exact hash code.
Michael Aaron Safyan