views:

269

answers:

4

This question is from an exam I had, and I couldn't solve it and wanted to see what the answer is (this is not homework, as it will not help me in anything but knowledge).

We need to create a data structure for containing elements whose keys are real numbers.
The data structure should have these functions:
Build(S, array): Builds the data structure S with n elements in O(n)
Insert(S, k) and Delete(S, x) in O(lgn) (k is an element, x is a pointer to it in the data structure)
Delete-Minimal-Positive(S): Remove the element with the minimal positive key
Mode(S): returns the key that is most frequent in S in O(1)

Now, building in O(n) usually means a heap should be used, but that does not allow to find frequencies. I couldn't find any way to do this so. Best I could come up with is building a Red-Black-Tree (O(nlgn)) that will be used to build a frequency heap.

I'm dying to know the answer...

Thanks!

+1  A: 

Well, you can use a hash table to calculate the number of occurrences of distinct real numbers in O(1) amortized time, and then use a standard heap where the items are pairs (real number, number of occurrences) and the heap is sorted according to the number of occurrences field.

When you insert a key or delete a key, you increment or decrement the number of occurrences field by one, or in the extreme cases add or remove a heap element. In both cases you need to percolate up / down because the ordering field has changed.

Assuming the hash table is O(1) operation, you have a standard heap + O(1) hash table and you get all the operations above within the time limits. In particular, you get the "mode" by reading the root element of the heap.

antti.huima
The requirement are O(1) in worst case. Actually, all the requirements are in worst case
CS n00b
In comparison model, there is no solution (see my answer). So hashing is probably the best option to achieve the stated bounds.
Moron
Perhaps we can make use of some form of perfect hashing to do the initial setup.
Moron
Actually it would be O(1) average time, not amortized, no? Unless you're growing the hash table, of course.
Joel
A: 

I think the following solution will be acceptable. It based on two data stuctures:

  1. Red-black tree
  2. Binary heap

Binary heap holds tuple, that contain (element value, frequence of element), heap is builded on frequencies, so it's give us ability to find mode in O(1).

Red black tree contains a tuple that hold (element value, pointer to same element value in heap)

When you need to insert new element, you will try to find element(it takes O(log n)), if search was succeful, than go to the pointer from element founded in RB-tree, increase frequence, and rebuild heap(also O(log n)). If search didn't find such element than insert it into RB-tree(O(log n)) and to heap with value = (element, 1) (also O(1)), set a pointer in RB-tree to new element in heap.

Initial building will take O(n), because building both structures from set of n element takes O(n).

Sorry, if I am miss something.

woo
Using rational numbers, you cannot get the frequency of an element in guaranteed O(1). So building the heap will take more than O(n). If we were talking small natural numbers we could build freq[i] = how many times i appears.
IVlad
Building a red black tree from an unsorted set of elements takes O(nlgn) time.
Joel
Yeah, I have mistake: initial build will has greater compexity than O(n). Sorry :(
woo
@Moron Do you mean inititial build? If yes, I accept that I was wrong about initional stage.
woo
@woo: Yes. Sorry I accidentally deleted the original comment.
Moron
+3  A: 

Using just the comparison model, there is no solution to this problem.

The Element Distinctness Problem has provable Omega(nlogn) lower bounds. This (element distinctness) problem is basically the problem of determining if all the elements of an array are distinct.

If there was a solution to your problem, then we could answer the element distinctness problem in O(n) time (find the most frequent element in O(n) time, and see if there are more than one instances of that element, again in O(n) time).

So, I suggest you ask your professor for the computational model.

Moron
That's what I thought, but he insists there's a way to solve this. Tried thinking about O(n) sorting routines, but having nothing about the keys other than being real numbers (no min/max bounds) I couldn't find anything
CS n00b
@CSn00b: Show him this 'proof' of impossibility and then ask him the model and allowed operations. Did the exam paper specify any specific model of computation?
Moron
@Moron - no, it didn't specify anything about it, but all we've learned is 14 first chapters of Cormen book (nlgn sorts and bucket/radix sort, trees, rb-trees, hashes, heaps)
CS n00b
@CSn00b: Without knowing what is allowed, it becomes a bit tricky. I am curious to know what your professor had in mind. Can't you just ask him the solution and the model he had in mind and post a brief sketch here?
Moron
@CSn00b: Moron is right, I am convinced as well there is no answer... I don't believe there is an O(n)-time sorting algorithm (even with arbitrarily-large space) that can sort real numbers
BlueRaja - Danny Pflughoeft
A: 

For frequencies:
Each entry is bi-directionaly linked to own frequencies/counters (use hash table)
Frequencies are in linked list.
There is need to move frequency up/down over linked list,(deleting inserting element) but for max difference of 1.

Frequencies are thus linked to pointer of +1 and -1 frequency element.

(there are exceptions but can be handled)

ralu