ansaurus

Question

Best data structure to retrieve by max values and ID?

Answer 1

A:

Max heap would match the second requirement, but hash maps or balanced search trees would be better for the first one. Make the choice based on frequency of these operations. How often would you need to locate a single item by ID and how often would you need to retrieve top 100 items?

Pseudo code:

add(Item t)
{
    //Add the same object instance to both data structures
    heap.add(t);
    hash.add(t);
}
remove(int id)
{
    heap.removeItemWithId(id);//this is gonna be slow
    hash.remove(id);
}
getTopN(int n)
{
    return heap.topNitems(n);
}
getItemById(int id)
{
    return hash.getItemById(id);
}
updateValue(int id, String value)
{
    Item t = hash.getItemById(id);
    //now t is the same object referred to by the heap and hash
    t.value = value;
    //updated both.
}

Amarghosh 2009-12-16 07:57:55

>How oftenThat's the problem. I will have roughly 50 requests per second and what kind of request will be more varies according to time.Allocate memory for each record,have one ID->record pointer hash map/balanced search tree to serve the first requirement, and have a value->record pointer max-heap to satisfy the second requirement?

2009-12-16 08:17:19

If modifications are not so frequent, I believe using a hash for id-based retrievals and a heap for sorted list wouldn't be a bad idea - can't say for sure without testing though.

Amarghosh 2009-12-16 09:22:01

what makes it tricky is that the values are being updated all the time ...

2009-12-16 09:28:23

If the hash and heap are both referring to the same object instance, you can modify values using an id-based retrieval from the hash and the changes made will be automatically reflected in the heap's references too.

Amarghosh 2009-12-16 09:37:08

but won't that affect the value based max-heap?

2009-12-16 09:45:59

not if both heap and hash are referring to the same instance of the data object. see the updated pseudo code

Amarghosh 2009-12-16 10:20:42

There is an added overhead in terms of the speed of add/remove operations. Memory overhead would be minimal as only one copy of data object exists per item (both heap and hash points to the same item in the memory).

Amarghosh 2009-12-16 11:05:47

Answer 2

A:

I know you want pseudo-code algorithm, but in Java for example i would use TreeSet, add all the records by ID,value pairs.

The Tree will add them sorted by value, so querying the first 100 will give you the top 100. Retrieving by ID will be straight-forward.

I think the algorithm is called Binary-Tree or Balanced Tree not sure.

medopal 2009-12-16 08:00:01

Interesting,maybe it's worthy of take a look on TreeSet's code.

2009-12-16 09:09:14

Answer 3

+1 A:

A hybrid data structure will most likely be best. For efficient lookup by ID a good structure is obviously a hash-table. To support top-100 iteration a max-heap or a binary tree is a good fit. When inserting and deleting you just do the operation on both structures. If the 100 for the iteration case is fixed, iteration happens often and insertions/deletions aren't heavily skewed to the top-100, just keep the top 100 as a sorted array with an overflow to a max-heap. That won't modify the big-O complexity of the structure, but it will give a really good constant factor speed-up for the iteration case.

Ants Aasma 2009-12-16 08:49:35

If the IDs are unique and dense, an array is better than a hash table, if unique and sparse, a B-tree is often used.

Svante 2009-12-16 09:23:56

ansaurus

tags:

views:

answers:

Best data structure to retrieve by max values and ID?

related questions