views:

133

answers:

3

I have a question about implementing a data structure (with algorithm) that has these features:

  1. There are too many places (like stores), and each place can store too many items, I want to store items in the places. The point is we need to have optimized insertion (or deletion) and optimized search feature.

  2. Search can be done based on places (return items in the place) or base on items (return places that have the item)

I thought to use a hashmap for this, and use places ID as the key and store items as a collection (again a hashmap with item id for both key and value).

So based on this, the insertion of each item or place would be O(1) but get or remove of place will be O(n) and for item it will be O(n*k)! (assume we have "n" places and "k" items - I hope this is the correct calculation).

Is there any better data structure or algorithm for this problem?

+1  A: 

Why not use balanced trees? You would need one tree at each store, mapping item ID to some internal information, and one global tree, mapping item ID to the list of the nodes having the given item. Your operations will be O(log n + log k).

Edit:
the operations in this case would look like the following: (I use the fact that std::map and std::set are tree-based in C++)

using namespace std;

typedef int itemid_t;
typedef int storeid_t;

map<itemid_t, set<storeid_t>> itemID2storeList;

class Store
{
    storeid_t Id;

    // insert requires one lookup in the map (O(log k))
    // plus one insert into a set (O(log n)).
    void insert(itemid_t item)
    {
        itemID2storeList[item].insert(Id);
    }

    // the same as for insert
    void delete(itemid_t item)
    {
        itemID2storeList[item].erase(Id);
    }

    // search in a store requires one lookup in a map (O(log k))
    // and one search in a set (O(log n)).
    bool have(itemid_t item)
    {
        const set<storeid_t>& storesForItem = itemID2storeList[item];
        return storesForItem.find(Id) != storesForItem.end();
    }
}

class Central
{
    // getting the list of stores having the given item requires
    // one lookup in a map (O(log k))
    const set<storeid_t>& storesForItem(itemid_t item)
    {
        return itemID2storeList[item];
    }
}
Vlad
Thanks for your answer, I'll try to implement that.BTW, can you provide an example (in java or C#) a link or something?
sander
Oh, I should add something, for this problem Items only have item id nothing more, I mean no other information as internal info.
sander
@sander: I've added a C++-based sample.
Vlad
+1  A: 

Why not use a real DB for that? I think VistaDB, SQLite etc. would be perfect for the job?

Lucero
Maybe because insertion into a DB (together with maintaining the proper indices for search) is quite slow?
Vlad
Or maybe because a database is not an algorithm. Using a database is delegating the question of choosing the right algorithm to the database developers and database administrator (which need to make the database generic enough, so its algorithms won't necessarily be optimal for your specific case).
Vlad
Well, even if your question does not address it I suspect that you will have some requirements regarding persistence as well. If you work on disk you may choose other algorithms and data structures than if you work in RAM, and it also depends a lot on the amount of data that you want to manage. If you think my answer was inappropriate I'm happy to delete it, but from your problem description I still believe that using a finished DB application may be the answer.
Lucero
I think any answer can give some clue to the OP about solution to his problem. Sometimes people ask not what they exactly need to know, so any related information is useful.
Vlad
Thanks for your answers,I should say that there is no access to a DB for this problem, (otherwise, it would not be hard to handle that!), indeed it is supposed to be an algorithm to handle (get(search), put, remove) the data fast, though in the optimum way.
sander
I really have no clue about difference to handle them on RAM or HDD now! For sure, data come from a source (consider a plain text file as the source) and should be store in the same place! But I believe during the process, I mean after loading data to the RAM, data will stay in RAM and only after complete process of all commands, (add, remove, search ,…) at the end of the day they should update the source. ( I emphasize that there is no access to any kind of DB). I hope it was clear enough.Thanks again
sander
A: 

The best algorithm/structure depends on how it will be used.

BTW this is a "grand truth". It applies to everything else in life. Picking the best car, bank, wife, etc.

Jay