views:

511

answers:

9

In the vein of programming questions: suppose there's a collection of objects that can be compared to each other and sorted. What's the most efficient way to keep track of the smallest element in the collection as objects are added and the current smallest occasionally removed?

+1  A: 

If you need random insert and removal, the best way is probably a sorted array. Inserts and removals should be O(log(n)).

Kyle Cronin
+6  A: 

Using a min-heap is the best way.

http://en.wikipedia.org/wiki/Heap_(data_structure)

It is tailor made for this application.

jjnguy
A fibonacci heap is faster than min-heap, it has O(1) insertion instead of O(log(n))
martinus
I've never heard of a Fibonacci Heap
jjnguy
Here is a link: http://en.wikipedia.org/wiki/Fibonacci_heap
martinus
[this was originally a separate 'answer' that I posted before comments were implemented] That was my initial suggestion until I realized that min-heaps weren't designed for random removal of elements, only the smallest one.Well whatever, it looks like your answer was accepted anyways.
Kyle Cronin
+1  A: 

@Harpreet
That is not optimal. When an object is removed, erickson will have to scan entire collection to find the new smallest.

You want to read up on Binary search tree's. MS has a good site to start down the path. But you may want to get a book like Introduction to algorithms (Cormen, Leiserson, Rivest, Stein) if you want to deep dive.

jms
A: 

If you need random insert and removal, the best way is probably a sorted array. Inserts and removals should be O(log(n)).

Yes, but you will need to re-sort on each insert and (maybe) each deletion, which, as you stated, is O(log(n)).

With the solution proposed by Harpreet:

  • you have one O(n) pass in the beginning to find the smallest element
  • inserts are O(1) thereafter (only 1 comparison needed to the already-known smallest element)
  • deletes will be O(n) because you will need to re-find the smallest element (keep in mind Big O notation is worst case). You could also optimize by checking to see if the element to be deleted is the (known) smallest, and if not, just don't do any of the re-check to find the smallest element.

So, it depends. One of these algorithms will be better for an insert-heavy use case with few deletes, but the other is overall more consistent. I think I would default to Harpreet's mechanism unless I knew that the smallest number would be removed often, because that exposes a weak point in that algorithm.

pbh101
Just wanted to point out that big-O is not necessarily worst-case.. it's just a notation for describing asymptotic behavior of functions, it doesn't "know" anything about worst-case or best-case or anything in between, it's just for describing the behavior of whatever function you're looking at.
dancavallaro
In fact, I'd go so far as to say that unless it's specified, big-O notation is used most often to describe the average case, not the worst-case.
dancavallaro
A: 

The heap solution is correct, I think. I neglected the removals when I read the question and suggested just keep tracking of the smallest. Since the additions and removals are log(n) operations in a heap it'll be better than using a sorted list. The inserts into that would be linear since you have to move items for an insert.

For worst case analysis we have to assume the smallest one is the one removed each time. Anything other than a heap will at least either take O(n) time to find the next smallest or require O(n) time to insert.

edit: well this is fun. It has to be slowing the rate at which I'm getting stupider, at least.

edit: Now I'm convinced in the average case there has to be better than a heap. Especially with random removals. The BST suggested seems like a good way to go. You'll need to construct a randomized BST though. Look up treaps for a solution.

Harpreet
A: 

Harpreet:

the inserts into that would be linear since you have to move items for an insert.

Doesn't that depend on the implementation of the collection? If it acts like a linked-list, inserts would be O(1), while if it were implemented like an array it would be linear, as you stated.

pbh101
A: 

Depends on which operations you need your container to support. A min-heap is the best if you might need to remove the min element at any given time, although several operations are nontrivial (amortized log(n) time in some cases).

However, if you only need to push/pop from the front/back, you can just use a mindeque which achieves amortized constant time for all operations (including findmin). You can do a scholar.google.com search to learn more about this structure. A friend and I recently collaborated to reach a much easier-to-understand and -to-implement version of a mindeque, as well. If this is what you're looking for I could post the details for you.

Tyler
A: 

pbh101:

Harpreet:

the inserts into that would be linear since you have to move items for an insert.

Doesn't that depend on the implementation of the collection? If it acts like a
linked-list, inserts would be O(1), while if it were implemented like an array it would > be linear, as you stated.

I was thinking of the array mentioned but you're right. On the other hand if you decide on a linked list or array you will have to pay the full price of sorting up-front and you may be able to avoid some of that on the heap. I guess it will depend on the frequency of inserts and removals and we would also have to consider searching for elements to remove. I'm changing my answer to a final: Depends.

Harpreet
+1  A: 

For occasional removes a Fibonacci Heap is even faster than the min-heap. Insertion is O(1), and finding the min is also O(1). Removal is O(log(n))

martinus