ansaurus

Question

Best Data Structure to Store Large Amounts of Data with Dynamic and Non-unique Keys?

Answer 1

A:

The general solution to this type of problem is to use a balanced search tree (e.g. AVL tree, red-black tree, B-tree), which guarantees O(log n) time (almost constant, but not quite) for insertion, deletion, and lookup, where n is the number of items currently stored in the tree. Guaranteeing no key is stored in the tree twice is quite trivial, and is done automatically by many implementations.

If you're working in C++, you could try using std::map<int, yourtype>. If in C, find or implement some simple binary search tree code, and see if it's fast enough.

However, if you use such a tree and find it's too slow, you could look into some more fine-tuned approaches. One might be to put your structs in one big array, radix sort by the integer key, cull on it, then re-sort per pass. Another approach might be to use a Patricia tree.

Joey Adams 2010-06-18 23:57:53

Answer 2

A:

Just stick 'em all in a big array.

When the time comes to do the cull and sort, start by doing the sort. Do an insertion sort. That's right - nothing clever, just an insertion sort.

After the sort, go through the sorted array, and for each object, make the culling decision, then immediately output the object if it isn't culled.

This is about as memory-efficient as it gets. It should also require very little computation: there's no bookkeeping on updates between cull/sort passes, and the sort will be cheap - because insertion sort is adaptive, and for an almost-sorted array like this, it will be almost O(n). The one thing it doesn't do is cache locality: there will be two separate passes over the array, for the sort, and the cull/output.

If you demand more cleverness, then instead of an insertion sort, you could use another adaptive, in-place sort that's faster. Timsort and smoothsort are good candidates; both are utterly fiendish to implement.

The big alternative to this is to only sort unculled objects, using a secondary, temporary, list of such objects which you sort (or keep in a binary tree or whatever). But the thing is, if the keys don't change that much, then the win you get from using an adaptive sort on an almost-sorted array will (i reckon!) outweigh the win you would get from sorting a smaller dataset. It's O(n) vs O(n log n).

Tom Anderson 2010-06-18 23:59:51

ansaurus

tags:

views:

answers:

Best Data Structure to Store Large Amounts of Data with Dynamic and Non-unique Keys?

related questions