views:

329

answers:

5

I have a scenario where I'm pushing change-lists to another system. Each list contains zero or more inserted, updated or deleted notifications.

Inserting is easy; the notification contains the target index and a pointer to the item. Updating is easy; I pass a pointer to the item.

Deleting seems straight-forward; I need to pass the index of the item to delete, but how do I know the index? Indexes start at zero and must be contiguous, but I make them up at insertion time. So I need to keep track of the index I make up for each item.

I can do this with, for example, a map: std::map<item*, int>, but then when I remove an item, I have to go re-number everything past it, which is O(N).

These lists of items are going to be large to the point where O(N) iteration is not acceptable. I'm sure this problem has been solved, I just don't know what the solution would be called. Searching for anything related to "linked list" creates a ton of noise.

One possible solution is a skip-list, where each node in the sublists knows how many nodes in the main list it skips, and since searching a skip list is O(log N) we can keep track as we go and find the index in O(log N) and also delete items in O(log N).

However implementing a skip-list seems like overkill here... is there a simpler solution?

EDIT:

Thanks all for your suggestions, but I think I've convinced myself the skip list is the right way to solve this problem here.

+3  A: 

See Finger Trees: A Simple General-purpose Data Structure by Hinze and Paterson.

See also the nice illustrations in MarkCC's blog post on finger trees.

Greg Bacon
Heh, I think I could implement a skip-list in C++ before I could grok that paper in it's fullness. :) I'll look at it though.
jeffamaphone
+1  A: 

edit: My previous solution was flawed std::vector::erase() is linear when moving elements. My new suggestion extends my previous comment to your question.

If you just work with pointers in the list, you can set the pointer to 0 after calling delete on the pointer, keeping the indices/keys valid. Then you should be able to use increasingly larger indices until the next index goes beyond std::numeric_limits<int>::max(). Then, when a larger portion of your list contains unused pointer elements set to zero, perform a synchronized cleanup of zero-pointers on both sides of the communication channel, followed by a recalculation of the indices. I don't know a good heuristic for this off the cuff, but you could keep track of the number of zero pointers, and compare it to the overall list size.

In fewer words, since calculation of the indices is O(n), delay it until you absolutely have to.

Mads Elvheim
Yeah, the sad part is I can't really control the other side. But you make a valid point.
jeffamaphone
+1  A: 

Couldn't you keep the deletion history and compensate for this when you do the lookup in the std::map<item*, int>?

What I mean is that the index in the std::map represents the original index of the item and then you have an auxillary map std::map<int, int> which stores how many times a given index has been deleted?

item* todelete; //from somewhere
std::map<int, int> history; //stored somewhere
std::map<item*, int> itemIndices; //stored somewhere
const int originalIndex = itemIndices[todelete]; //index of item at insert time
int index = originalIndex;
for (std::map<int, int>::const_iterator it = history.begin(); it != history.end() && it->first < originalIndex; ++it) {
    index -= it->second;
}
// index has now been compensated for previous deletes
// ... send the delete request with 'index'
// update the history with this delete request
std::map<int, int>::iterator it = history.find(index);
if (history.end() == it) {
    history[index] = 1;
} else {
    ++it->second;
}

The speed of this will of course depend on the size of the history.

/A.B.

Andreas Brinck
This is still linear, sadly.
jeffamaphone
+1  A: 

How often will deletion occur? I'm thinking of keeping your solution using std::map<item*, int>, but instead of updating the map upon deletion replacing the item in the linked list with a "NULL"-item to ensure that the indices in your lookupmap remains valid. This may not be a good solution if you will see frequent deletions and there's a chance you will run out of memory. Also, you could do this and have a reindex()-method that removes any NULL item from the linked list and assigns new indices to all items.

Sidenote 1: Can't you pass the pointer to item being deleted as you do in update? If the you do this and use a double-linked list the deletion operation could be performed easily in O(1).

Sidenote 2: Consider using boost::unordered_map over std::map.

larsm
Deletion itself can be done in constant time, but finding the item in the list is still linear. The item pointer I'm passing is the value for items in the list on the other end, not the pointer to list nodes themselves.Boost is banned from the project, for good or ill. Where appropriate we're using hash_map (std/stdext).
jeffamaphone
This was exactly what I suggested as well :-)
Mads Elvheim
A: 

Skip List is the correct solution.

jeffamaphone