views:

696

answers:

6

Input : {5, 13, 6, 5, 13, 7, 8, 6, 5}

Output : {5, 5, 5, 13, 13, 6, 6, 7, 8}

The question is to arrange the numbers in the array in decreasing order of their frequency, preserving the order of their occurrence.

If there is a tie, like in this example between 13 and 6, then the number occurring first in the input array would come first in the output array.

+2  A: 

I can think of two solutions:

Make a first pass to count frequencies and store the first index and then sort according to that

This is simple to implement, uses linear (O(N)) additional memory in the worst case and takes O(N log N) time.

Do everything in one pass with some sort of priority queue, where the "priority" is the frequency count (changing as you pass through the input), and the index of the first occurrence to break ties

This does everything in a single pass, but I can't think of any advantage over the other solution. It still requires O(N) additional memory, and will also take O(N log N) time. Also, it requires a more complex data structure.

Martinho Fernandes
My original answer used a stable sort, which is why i had the incorrect pattern ...13,6,13,6... instead of ...13,13,6,6... What is needed instead is to sort by frequency and then by index of the first occurance in the list. My answer does this by putting both parts into a tuple and the natural ordering of the tuples takes care of the rest.
gnibbler
@gnibbler is right. You should sort by freq and ordinal.
andras
@gnibbler, andras: Nicely pointed. And now I can think of another way of doing it that can dismiss that need: a simple variation of counting sort: http://en.wikipedia.org/wiki/Counting_sort
Martinho Fernandes
@andras: That guarantee is enough to guarantee order. Have a look at the heapsort algorithm (http://en.wikipedia.org/wiki/Heapsort). It sorts by putting everything in a heap and then repeatedly removing the root of the heap (i.e. the minimum).
Martinho Fernandes
@Martinho: Thank you - but I think there is a misunderstanding here: what I mean is that your claim about doing it in a *single pass* is wrong. You have to do a second pass to actually sort the items in your priority queue - just as you do it in heapsort after you heapify in the first pass.
andras
+1  A: 

In Python2.7 or Python3.1

>>> from collections import Counter
>>> L=[5, 13, 6, 5, 13, 7, 8, 6, 5]
>>> c=Counter(L)
>>> def keyfunc(x):
...     return (-c.get(x),L.index(x))
... 
>>> sorted(L,key=keyfunc)
[5, 5, 5, 13, 13, 6, 6, 7, 8]

In Python2.6

>>> from collections import defaultdict
>>> L=[5, 13, 6, 5, 13, 7, 8, 6, 5]
>>> c=defaultdict(int)
>>> for x in L:
...     c[x]+=1
... 
>>> def keyfunc(x):
...     return (-c.get(x),L.index(x))
... 
>>> sorted(L,key=keyfunc)
[5, 5, 5, 13, 13, 6, 6, 7, 8]

Here is a version that doesn't use any library functions (weird constraint)

>>> L=[5, 13, 6, 5, 13, 7, 8, 6, 5]
>>> c={}
>>> for x in L:
...     c[x]=c.setdefault(x,0)+1
... 
>>> def keyfunc(x):
...     return (-c.get(x),L.index(x))
... 
>>> sorted(L,key=keyfunc)
[5, 5, 5, 13, 13, 6, 6, 7, 8]

In each case, keyfunc is used to control the ordering of the sort

keyfunc(5) returns (-3,0)
keyfunc(6) returns (-2,2)
keyfunc(7) returns (-1,5)
keyfunc(8) returns (-1,6)
keyfunc(13) returns (-2,1)

The list items are sorted according to the return value of keyfunc

gnibbler
Need concrete algo or code not any API or library uses.
Rajendra Kumar Uppal
In management1.0 say("intern: sort this list of numbers into decreasing order of frequency and put it on my desk")
Martin Beckett
Rajendra, this is an implementation of the first approach in Martinho's answer
gnibbler
The output is not as expected...
Lucero
@Rajendra: you have your algorithm right there. Just look at the code carefully: first you count, and then you sort, with the sort key being the count (`c.get`).
Martinho Fernandes
@gnibbler: I did not run the code (no python here :( ) but your own results are wrong... [..., 13, 6, 13, ...]? Was that a typo, or the real result?
Martinho Fernandes
@Lucero, Martinho, missed the part about ties. fixed now
gnibbler
Sorry if I seem pedantic, but what about `sorted`? Doesn't that count as a library function? (Not that I expect you to write out a sort function for this trivial question ;)
Martinho Fernandes
@Martinho Fernandes, sorted is a builtin - no more of a library function than say print or str. See http://docs.python.org/library/functions.html for more info
gnibbler
+1  A: 

You can probably do it in one pass with a bubble sort type algorithm where you keep a note of the count of the previous value and swap bunches of numbers when you find more of another number.

But - as a first step you should do a 2pass solution, with a std::map/pair to store number or if you are told the range of values in advance use a simple array.

Martin Beckett
A: 

Sort the array, and in your sort function for x, y: sort by count(x) vs. count(y). If they are the same, sort by index(x) vs. index(y)

In python:

input = [5, 13, 6, 5, 13, 7, 8, 6, 5]
orig = list(input)

def cmp(x, y):
    if (orig.count(y) - orig.count(x) != 0):
        return orig.count(y) - orig.count(x)
    return orig.index(x) - orig.index(y)   

input.sort(cmp) 
print input

To make it more efficient, precalculate counts and indexes before sorting the array.

Tuomas Pelkonen
To make it even more efficient use a key function instead of cmp. cmp is gone in Python3
gnibbler
+5  A: 

I guess I'd do it like this:

Use a key-value data structure where the number itself is the key and the occurrence count and first occurrence index are the value.

Now traverse all numbers. If the number is not yet known (in the data structure), add it, remembering the current index as well as 1 as count. Otherwise, increment the count.

Now sort the data structure contents by the occurrence count (decreasing) and the occurrence index (increasing), and output the result (repeating the number using the occurrence count).

Processing space used <= N, time used depending on data structure and dictionary would probably be about O(N log N)

Lucero
+1: in `C++` it would give: `std::map<Number, std::pair<Count, Index> >`. Could you offer a code sample ? (for example in Python, since it's usually close the a formal algorithm)
Matthieu M.
I came up with the same answer as this and implemented in C# in my post here: http://stackoverflow.com/questions/2523883/how-to-properly-translate-the-var-result-of-a-lambda-expression-to-a-concrete-t
CrimsonX
+1  A: 

C#. It takes the most obvious approach: sort by frequency and ordinal.

Output: 5 5 5 13 13 6 6 7 8. Space: O(n). Time: O(n log n).

class Program
{
    class FreqAndOrdinal
    {
        public int Frequency;
        public int Ordinal;
        public FreqAndOrdinal(int freq, int ord)
        {
            this.Frequency = freq;
            this.Ordinal = ord;
        }
    }

    static int Compare(FreqAndOrdinal x, FreqAndOrdinal y)
    {
        int result = y.Frequency.CompareTo(x.Frequency);
        return result == 0 ? x.Ordinal.CompareTo(y.Ordinal) : result;
    }

    static void Main(string[] args)
    {
        int[] nums = new int[] { 5, 13, 6, 5, 13, 7, 8, 6, 5 };
        var freqLookup = new Dictionary<int, FreqAndOrdinal>(nums.Length);
        for (int i = 0; i < nums.Length; i++)
        {
            FreqAndOrdinal tmp;
            if (freqLookup.TryGetValue(nums[i], out tmp))
                ++tmp.Frequency;
            else
                freqLookup[nums[i]] = new FreqAndOrdinal(1, i);
        }

        Array.Sort(nums, (x,y) => Compare(freqLookup[x], freqLookup[y]));

        for (int i = 0; i < nums.Length; i++)
        {
            Console.Write(" {0}", nums[i]);
        }
        Console.ReadKey();
    }
}
andras