views:

44

answers:

4

Hello! I need an algorithm that introduces data like a stack, so when I scan the structure I can read them in the same sequence as they were introduced, for sequential access. Also these values are stored in buckets, like a hashtable, so I can fragment the whole structure for disk storage and have fast random access.

Is there an algorithm like this, or should I have two separate structures? What's the best strategy?

Best regards!

+2  A: 

What I would probably do would be this:

  1. Create a hash table where you actually store the entries
  2. Create a stack that stores the true memory locations of the objects (not the objects themselves)
  3. Abstract these two structures behind a class (or something similar), so that its true implementation is hidden from the user.
Stargazer712
Also, as James pointed out, it seems like you are actually talking about a queue, but it doesn't change how I would implement things.
Stargazer712
pointed out in my post that a removal of an object without a key from a hashmap is not o(1) (when doing a stack.pop() operation), and removal of a random object from a stack (when doing a hashmap.remove(key) operation) is not o(1) either. this makes any removal have the penalties of both data structures without any of the gain. however get/peeks and put/pushes will be unaffected.
aepurniet
A: 

First of all, I think you mean "introduces data like a queue". A stack will return data in the reverse order it was entered. (Also, I'm not exactly sure what you mean by "introduces data" --- I'm not sure of that's a language problem, or a data structures expression that I've never heard).

My suggestion would be to use an intrusive linked-list (one where the link is store within the data object) to create your linear list. You can then put the data object (with the link attached) into a hashtable.

James Curran
+1  A: 

This would be something like an Ordered Map, right? Those are usually implemented by combining a linked list with whatever you want to use to implement a map (e.g. a hash table).

In Ruby 1.9, the specification of the Hash class (which is confusingly how Ruby spells "Map") was changed such that it preserves insertion order. Most Ruby 1.9 implementations I know implemented this as some sort of combination of a list and a hash table. Rubinius's implementation is especially easy to read, since it is written 100% in Ruby: kernel/common/hash.rb

Java has an implementation of an ordered map, called LinkedHashMap. Here's the source from Oracle OpenJDK 7: /share/classes/java/util/LinkedHashMap.java

Apache Commons Collections has an OrderedMap interface and two implementations: LinkedMap and ListOrderedMap.

If you are a little bit careful, you should be able to preserve the asymptotic complexity guarantees of an unordered map.

Jörg W Mittag
A: 

like stargazer pointed out an abstraction class with two data structures is one way to do it. however there are some considerations.

  1. when you pop a value would you also have its key? if not then removing it from the hashmap will be a tough operation (not O(1)). maybe its worth keeping every objects key in that object if this will be the primary mode of getting data out.

  2. when you remove a value from the table you also have to remove it from the list. maybe its worth keeping that objects ListNode in the object for an O(1) removal from the stack/queue.

if one method of removal is gonna dominate, it might be easy to take one of the above penalties.

  1. for instance if you would not remove things by key (ala hashmap) and only pop (ala stack/queue), and the key is easily computable from the object, (p1) is not much of a penalty and the abstraction approach works.

  2. if the hashmap method of removal is gonna really dominate, it might be worth it to sort the stack by hashkey, and then binary search and then remove it from the stack/queue.

  3. if you want the hashmap to store everything (never removes data), but keep a separate queue/stack for processing (only that removes data) then abstraction is gonna be perfect.

however if none of these are the case, and you cant change the object (to force it to implement your caching scheme), that would imply you have to cache both of those values yourself (object key, and listnode). which introduces a third data structure to your abstraction class. at this point you have to ask yourself if its worth it with this abstraction approach? in those situations where data will be removed both ways equally (or unpredictably) then its worth doing something prerolled, ala jorg mittag. those implementations i believe are straight hashmaps, but each hashnode also has a prev and next link like a list, and the hashmap itself has a tail and head. (for stack/queue like pops/pushes)

basically it comes down to how you are gonna use this data structure. no algorithm is can be viewed as the best strategy, without a breakdown of how all the methods will be called, the use cases.

aepurniet