views:

91

answers:

3

I have a list of changes to a list - Adds and Deletes. The list could be huge - say 10'000 items.

I want to know the state of the list after change 9'000.

I could walk the list from the start all the way to change 9'000. That seems a bit long-winded to me.

I could keep a list of items and record when they're added and when they're deleted, then walk this list to see what is in the list at a particular change. If Adds and Deletes were equally likely, I would halve the number of list elements I would need to walk through...

But Big O notation says that halving the size of the problem doesn't make things more efficient (if I've understood it correctly).

I could cache the state of the list at every 100th or 1000th change... but again, big O says that dividing the number of items by 'n' doesn't make things more efficient.

So what is the efficient way of doing this? Is there an efficient way of doing this?

More details: Specifically, I'm tracking memory allocations / deallocations in a custom allocater. Each allocation / deallocation is an event in the list. Each allocation has a unique id. I'd like to know what is currently allocated after (e.g) 9'000 events.

My first idea was to store, for each id, the event it was allocated and the event it was deallocated. Then to walk this list up to the first allocation whose alloc event is greater than 9000. But like I said, this would only halve the number of items that I'd need to walk through.

I like the point made by Mike F - walking from the nearest 100th item is constant time...

A: 

What sort of structure are you working with? There isn't an efficient way to walk a generic data structure, but there are thousands of optimisation methods and efficient methods for specific structures.

And yes, if you have an algorithm that is O(n) time complexity, halving the number of items won't change it from O(n) complexity... but it will mean that each new item only has half the effect it had originally. Big O notation is a good way of classifying algorithms, but it doesn't really get into efficiency apart from at huge numbers (one good example is sorting. quicksort is worse complexity than mergesort in the worst case... but you can implement quicksort more efficiently than mergesort for almost any application other than ones dealing with sorting millions of items)

workmad3
+1  A: 

If you cache the state of the list every Xth change, then you can do a binary chop to get down to two cached states bounding the change you're looking for, then you walk at most X items to get to the item itself. That's O(log N), more-or-less.

But more generally, reducing big O complexity is the means, not the end. If your list is generally 10,000 items then you should worry about making it fast for N=10,000, whether by reducing complexity, or by just making it faster.

Edit: Oops, I just read your question more carefully. If you cache the state every (eg) 100 items, you're not searching so you don't even need to do a binary chop - you just jump directly to the closest cached state and walk at most 100 items to get to the item itself. So that's a constant-time algorithm no?

A: 

'Timestamp' or mark each insertion and deletion, then it would take a simple traversal to find changes (O(n)).

leppie