What is the best way to find the most recurring values in a set? I'd like to use a one-pass algorithm, assuming that values are from the 1,2,3,4,..,m domain?
If I had to write an algorithm to do that, how would I do that?
What is the best way to find the most recurring values in a set? I'd like to use a one-pass algorithm, assuming that values are from the 1,2,3,4,..,m domain?
If I had to write an algorithm to do that, how would I do that?
SELECT value
FROM table
GROUP BY
value
ORDER BY
COUNT(*) desc
LIMIT 1
SELECT value, COUNT(*) frequency
FROM table
GROUP BY value
ORDER BY COUNT(*) DESC
Store them in a hash table, with a count of how many times each one was stored (O(n)).
Then loop through the buckets (O(n)).
By definition, a set contains only unique values. Thus, the answer should be the set itself, which can be "computed" in constant time. :-)
Seriously though, assuming that you're actually working with a heap, list, vector or some other data structure which allows duplicates, probably the fastest way to solve the problem is the answer from Mike Dunlavey, which is to use a hashtable. There are also some techniques using trees you could use which employ successively more refined estimates. I think such an approach would be O(n log n) (not as good as the hashtable solution), though perhaps it could be as low as O(log n) if you permit some statistical error.