views:

362

answers:

4

I don't know what data structure to use for this problem. I want the structure to have:

  • Constant time insertion or deletion.
  • Constant time retrieval by id.

The actual system is:

I've got a bunch of objects each with a unique id. My program will need to receive requests for an id and return the relevant object.

Whenever it receives a request I want it to: search the structure to see if it's there. If it is, return it. If it isn't, load it from the disk into memory (put it in the structure so that next time it is requested it doesn't have to use the disk) and then return it.

I'm using C.

Here's a similar question but I'm not sure how relevant it is.

+10  A: 

A Hash table might be a pretty good solution in your case -- even if it's not in O(1) when there's a colision : it's a quite efficient solution.

Pascal MARTIN
Be sure to look at several hash table implementations in order to find the best performance to your needs.If you want to learn more, don't miss to read about Google's "BigTable". They have a great Hash Table implementation. (it's actually a database, but uses serveral Hash Table concepts, and a Hash Table as a index)
Luís Guilherme
+2  A: 

The only structure that fits it is... C array. SomeObject arr[MAX_NUM_OBJECTS] handles this in fast and efficient way

alemjerus
Wait... how's your performance on deletion, when not at the end?
Carl Smotricz
Ditto for insertion.
Jon Skeet
O(1) because the only way you can get O(1) on lookup by ID is when you can use the ID as array index - and then deletion is just setting the entry to null.
Michael Borgwardt
++ @Michael Borgwardt: Right. In the case where the ID can be used as an array index, the identity function is a "perfect hash function". If the ID is bigger than this, you're back to normal hashing, which is also O(1) but can be slower by a sizeable constant bound.
Mike Dunlavey
+1  A: 

Why not just put them all in a file, mmap the file, and let the OS handle the caching?

Andrew McGregor
That's a good point the only reason for not doing that is that it is memory inefficient in this case. The object I'm working with are 268 bytes each and a page of mem is 4KB.
Ollie Saunders
OK, but reading 268 bytes off the disk is not necessarily going to be any faster than reading 4KB.
Jason Orendorff
A: 

Depends on how much data and what the keys are. Hash tables work well for large amounts of data that would be difficult to compare (strings, complex data structures, ect). For smaller data sets with integer keys you can often get better performance with simple red-black trees. This is because good hash functions take time to execute.

In reality neither of those really meet what you're asking. O(1) access is only truly achievable in perfect hashes (and no hash is perfect) and arrays. Meanwhile O(1) insertion is only possible in queues, dequeues, stacks, linked lists, and perfect hashes (which again, no hash is perfect).

To give you a better answer we really need to know what you are trying to accomplish. Any more information on what you are using the data structure for would help

Chris