views:

205

answers:

4

What is the best way to find the most recurring values in a set? I'd like to use a one-pass algorithm, assuming that values are from the 1,2,3,4,..,m domain?

If I had to write an algorithm to do that, how would I do that?

A: 
SELECT  value
FROM    table
GROUP BY
        value
ORDER BY
        COUNT(*) desc
LIMIT 1
Quassnoi
Well I know how to do it in form of SQL query..my question is if I have to write a psuedocode for the alogoritham that does this in one-pass algoritham(one tuple at a time), how to do it? assuming that you have memory no greater than log(m).
Then why do you reference SQL at all in your question? It sounds like you're asking for a general-purpose algorithm for an arbitrary data structure representing a relation.
Dave Costa
Ok..sorry for the cofusion...yes that's what I am asking...thank you..
so i just wasted my time trying to help?
DForck42
well it's not wasted...You also answered something I did not know...
You should put all requirements into your question, Nimesh. Especially the memory restriction seems important. On the other hand, this somehow smells like disguised homework.
Svante
Jhonny D. Cano -Leftware-
well no it is not a homework...it's part of the project that I am working on...thank you for your comments
+2  A: 
SELECT value, COUNT(*) frequency
FROM table
GROUP BY value
ORDER BY COUNT(*) DESC
Jhonny D. Cano -Leftware-
Thank you...for your answer
+1  A: 

Store them in a hash table, with a count of how many times each one was stored (O(n)).
Then loop through the buckets (O(n)).

Mike Dunlavey
Thank you mike for your comment...
You're welcome. I like the easy ones.
Mike Dunlavey
What about O(logn)???? Is there a way to get O(log n) instead of O(n)?
@Nimesh: How could you possibly get below O(n)? You have to look at all n numbers -- that alone takes O(n) time!
j_random_hacker
Well, theoretically there's an O(1) method, if the list is finite. Just use the list as an index into a rather big array, and fetch the answer :-)
Mike Dunlavey
@Mike: Sure, but that approach doesn't leave many algorithms that *aren't* O(1)... :)
j_random_hacker
@j: Ain't CS wonderful? All we gotta do is wait for the hardware to catch up! :-)
Mike Dunlavey
A: 

By definition, a set contains only unique values. Thus, the answer should be the set itself, which can be "computed" in constant time. :-)

Seriously though, assuming that you're actually working with a heap, list, vector or some other data structure which allows duplicates, probably the fastest way to solve the problem is the answer from Mike Dunlavey, which is to use a hashtable. There are also some techniques using trees you could use which employ successively more refined estimates. I think such an approach would be O(n log n) (not as good as the hashtable solution), though perhaps it could be as low as O(log n) if you permit some statistical error.

Daniel Spiewak
Thank you for your comment. Values are not unique. Table will have duplicates. It is just that values will be from domain (1,2,3,4...m).