tags:

views:

233

answers:

5

I have an unsorted array (of generated die rolls), and I want to select the top N elements of it (which is trivial with sort-and-snip), but mark them while keeping them in order.

E.g.:

mark([1,2,3,4], 3) ==> [[1, false], [2, true], [3, true], [4, true]]

mark([3,5,2,6,2], 3) ==> [[3, true], [5, true], [2, false], [6, true], [2, false]]

The array may contain any values from 1 up, and be of any length, and the amount of marked elements is variable.

I could live with

mark([3,5,2,6,2], 3) ==> [[3, true], [5, true], 2, [6, true], [2, true]]

(I.e., numbers that'd be marked false to go unmarked), but I'd rather avoid it.

What's mandatory is that the order of the array stays unchanged.

If the top elements are repeated (eg: top 3 of [6,6,6,6,6,6]), mark the first 3 elements.

(N is sufficiently small for complexity not to matter much.)

EDIT: Bonus point: add a parameter to switch between "top" and "bottom" mode.

A: 

If it's small enough for complexity to not matter: (psuedocode)

for(int m = 0; m < mark_count; m++) {
    highest = MIN_INT;
    highestindex = -1;
    foreach i in array:
        if array[i] > highest && is_unmarked(i)
            highest = array[i]
            highestindex = i;
    mark(i)
}

EDIT: If you want to find the bottom ones instead, start out our counter at MAX_INT and check that the value in the array is less than it.

And if you want sample implementations of mark() and is_unmarked:

function mark(i) {
    array[i] = [array[i], true];
}

function is_unmarked(i) {
    if (array[i] is array & array[i][1] == true)
        return false;
    return true;
}

(Not sure whether is works as I expect it to - but the meaning is clear, I hope)

Anon.
My code's compiling.
Anon.
Chosen for being the only one who actually answered the question.
Tordek
A: 

You could use the quick select algorithm with an array of indices. Rather than manipulating the passed in array, you'd manipulate the order of the indices, and then mark the top N. This would take linear time.

dsimcha
As I said, picking the top N elements is trivial, and since N is small enough for N log N to be good-enough, `head(n, sorted(array))` gives me the top N elements. The question regards the marking implementation.
Tordek
Well, you'd have the indices in my solution. You could then index the array and make the changes.
dsimcha
A: 

I would optimize this if needed in the following way. If it does not need to be optimized then do what you will.

  1. If array is smaller than N "top N" value then mark all TRUE. O(N)
  2. Go through arr of items and mark all false while at the same time maintain a list of arr positions and values of the top N items. Additionally you only need to check/compare against the Nth item, lowest, greater(topN[N-1], arr[i]). While the topN is filling to N when iterating over arr[0] to arr[N-1] there should be a condition to just add it to the top N list until N is filled, still keeping the lowest at the bottom. O(arr.size())
  3. Now that all are marked false loop over top N list and go back and mark in arr the positions that as true as they were in the top N. O(N)

So the overall cost is O(N + arr.size())

Ted Johnson
You don't normally include coefficients or constants with Big-O notation.
dreamlax
+1  A: 

I assume we're talking about PHP here, because the question is tagged PHP. Any clever algorithm you will try to implement will be slower than using a built-in function. That's just how PHP works, it's not good at crunching numbers in userspace.

So what you need to do is sort a copy of the array and save the keys of the top N elements, then iterate through the array and mark said elements. But there's a catch: PHP's sort is not stable. Which means that if you want to use the elements' positions in case of ties, you'll have to do it yourself. So instead of using a function such as asort() or arsort(), you'll have to use array_multisort().

The result is this:

function mark(array $arr, $n, $order = SORT_DESC)
{
    $keys = $values = $position = array();

    $i = 0;
    foreach ($arr as $k => $v)
    {
        $keys[]     = $k;
        $values[]   = $v;
        $position[] = $i;

        ++$i;
    }

    // sort values in given $order, use their position as tiebreaker (always in ascending order)
    array_multisort($values, $order, $position, SORT_ASC, $keys);
    $mark = array_flip(array_slice($keys, 0, $n));

    $ret = array();
    foreach ($arr as $k => $v)
    {
        $ret[] = array($v, isset($mark[$k]));
    }

    return $ret;
}

Which produces

SORT_DESC
[3,6,6,6,6,6,2] => [[3,false],[6,true],[6,true],[6,true],[6,false],[6,false],[2,false]]
[3,5,2,6,2]     => [[3,true],[5,true],[2,false],[6,true],[2,false]]

SORT_ASC
[3,6,6,6,6,6,2] => [[3,true],[6,true],[6,false],[6,false],[6,false],[6,false],[2,true]]
[3,5,2,6,2]     => [[3,true],[5,false],[2,true],[6,false],[2,true]]
Josh Davis
+1  A: 

The currently accepted answer scans the input list m times. This one scans it just twice. O(n) vs O(n*m). You need a heap data structure. Here it is in Python.

import heapq

def mark(data, n):
    top = heapq.nlargest(n, ((value, index) for index, value in enumerate(data)))
    indexes = set(value[1] for value in top)
    return [[value, index in indexes] for index, value in enumerate(data)]

print mark([1, 2, 3, 4], 3)
print mark([3, 5, 2, 6, 2], 3)

Output:

[[1, False], [2, True], [3, True], [4, True]]
[[3, True], [5, True], [2, False], [6, True], [2, False]]
FogleBird
O(m*n) vs O(n). You're still scanning the list.
Tordek
Oops, not sure where my head was.
FogleBird