views:

223

answers:

3

I have a collection of items (big rationals) that I'll be processing. In each case, processing will consist of removing the smallest item in the collection, doing some work, and then adding 0-2 new items (which will always be larger than the removed item). The collection will be initialised with one item, and work will continue until it is empty. I'm not sure what size the collection is likely to reach, but I'd expect in the range 1M-100M items. I will not need to locate any item other than the smallest.

I'm currently planning to use a red-black tree, possibly tweaked to keep a pointer to the smallest item. However I've never used one before, and I'm unsure whether my pattern of use fits its characteristics well.

1) Is there a danger the pattern of deletion from the left + random insertion will affect performance, eg by requiring a significantly higher number of rotations than random deletion would? Or will delete and insert operations still be O(log n) with this pattern of use?

2) Would some other data structure give me better performance, either because of the deletion pattern or taking advantage of the fact I only ever need to find the smallest item?

Update: glad I asked, the binary heap is clearly a better solution for this case, and as promised turned out to be very easy to implement.

Hugo

+5  A: 

A heap will give you O(1)O(log n) removal and O(log n) insertion, and is much easier to implement than a red-black tree

BlueRaja - Danny Pflughoeft
Actually, removal is O(log N), **locating (finding the value of)** the minimum / maximum is O(1).
IVlad
I have never seen a heap with 1M-100M items in it, does anyone have some information on how that affects its speed?
NickLarsen
@NickLarsen: that's exactly what Big-O notation is for.
BlueRaja - Danny Pflughoeft
@BlueRaja: I realize that, however, Big-O is in the theoretical world, and 100 million records on a computer is in the physical world. With limited memory, are there better structures than a heap?
NickLarsen
@Nick: A heap can be implemented using an array, so it requires 0 extra space
BlueRaja - Danny Pflughoeft
+11  A: 

A binary heap is a lot better for what you want. It is easier to implement and faster since you only care about locating the smallest element and insertions. Locating the smallest element is O(1), removing it is O(log N), and an insertion is also O(log N).

IVlad
+1  A: 

It's good to know how to create the more complicated data structures if you need to. However, generally your best bet is to start off as simple as you can, and only use something more complex when it turns out you need to.

The only time I ever implemented a self-balancing tree was one time when I happened to know that my tree was going to be really big (over 10,000 elements), and the data was going to come in sorted spurts. That meant if I'd used a normal binary tree, I would have ended up with nearly a linked-list.

If your data is being entered in a randomish order, you really shouldn't bother with a balancing algorithim.

T.E.D.