views:

848

answers:

16
  1. Suppose given an array of size n, with sorted values.
  2. In iteration i, a new random-generated value is given, and inserted into the end of the array.
  3. The array is then resorted, and discard the least value item.
  4. After iteration n, the retained array will contain the largest value items.

For example, in Java syntax, it will be something like:

List l = new ArrayList();
l.add(new Integer(2));
l.add(new Integer(3));
l.add(new Integer(6));
l.add(new Integer(9));

Random rand = new Random();
for (int i=0; i < n; i++) {
  l.add(new Integer(rand.nextInt(1000)));
}    
Collections.sort(l);
l.remove(0);

But it seems it's inefficient. Any better algorithm?

+11  A: 

Use a binary insert (works like a binary search) for the new value. Discard the smallest. Should be quite fast.

By the way - this can be implemented as a handy extension method:

private static int GetSortedIndex( this IList list, IComparer comparer, object item, int startIndex, int endIndex )
{
  if( startIndex > endIndex )
  {
    return startIndex;
  }
  var midIndex = startIndex + ( endIndex - startIndex ) / 2;
  return comparer.Compare( list[midIndex], item ) < 0 ?
    GetSortedIndex( list, comparer, item, midIndex + 1, endIndex ) :
    GetSortedIndex( list, comparer, item, startIndex, midIndex - 1 );
}

public static void InsertSorted( this IList list, IComparer comparer, object item )
{
  list.Insert( list.GetSortedIndex( comparer, item ), item );
}
tanascius
If 'list' is an ArrayList (as in the asker's example), this is still O(n).
Nick Johnson
It's quite slow since inserting into a List is slow.
Dario
The question is tagged 'java', therefore no extension methods.
Michael Myers
That's right - I forgot it, when I edited the answer. Maybe it helps some .Net people - so I don't remove it.
tanascius
+3  A: 

A very simple optimalization would be to compare the lowest value in the sorted array (should thus be the first item) with the new value before inserting it. If the new value is greater than this value, replace the element with the new value and then resort the array.

Thorarin
+1  A: 

You can use binary search to insert a value into a sorted array.

Edouard A.
+7  A: 

Use a TreeSet instead of a List, it'll maintain the order such that such that the largest value will always be at SortedSet#last(). If using 1.6+ you can use NavigableSet methods; pollLast() will return and remove the highest value.

NavigableSet<Integer> set = new TreeSet<Integer>();

//... setup data

Integer highest = set.pollLast();

set.add(rand.nextInt(1000));

Integer newHighest = set.pollLast();
Gareth Davis
+4  A: 

Use a min-heap to store the data, and after each insertion of a new random value, delete the min in O(1) time.

After n iterations, perform n extract-min's to get the sorted list.

Suvesh Pratapa
I interpret the question so that "the array remains sorted" and "the least element is removed" are _both_ postconditions, not only the latter.
Daniel Daranas
I'm sorry, I think my method would work even if we wanted a sorted list. Changed the answer to reflect that fact.
Suvesh Pratapa
+1  A: 

If you're working with an ArrayList, you can replace the last number in the array with the new number if the new number is larger before you sort the array.

The Java Collections.sort uses merge sort which isn't the most efficient way of sorting in this situation. You want to use a binary search to find the insertion point and then shift all the subsequent numbers along by one.

EDIT: This can all be done with just an array like so:

public static int addDiscard(int[] list, int number)
{
    if (number > list[list.length - 1])
    {
     int index = findInsertionIndex(list, number); // use binary search
     for (int i = list.length - 1; i > index; i--)
     {
      list[i] = list[i - 1];
     }
     list[index] = number;
    }
}
David Johnstone
+1  A: 

I don't know whether you can change the data structure, or what other operations you need to support, but a heap would be a better fit for the kind of operations you describe.

Svante
+1  A: 

This will keep the size at 4 and do what you want as I understand it.

SortedSet<Integer> set = new TreeSet<Integer>();
set.add(2);
set.add(3);
set.add(6);
set.add(9);
Random rand = new Random();
for (int i=0; i < n; i++) {
  int i = rand.nextInt(1000);
  set.remove(set.first());
  set.add(i);
}
cletus
+2  A: 

The fastest algorithm I can think of would be to replace the smallest element with the new one, if needed, and push the new one to its proper place by repeatedly swapping with adjacent elements.

EDIT: The code assumes that the array is sorted in descending order, and thus the last element is the smallest.

void Insert(int[] array, int newValue)
{
    // If the new value is less than the current smallest, it should be
    // discarded
    if (new_value <= array[array.length-1])
        return;

    array[array.length-1] = newValue;
    for (int i = array.length-1; i > 0; --i)
    {
        if (newValue <= array[i-1])
            break;

        // Swap array[i] with array[i-1]
        array[i] = array[i-1];
        array[i-1] = newValue;
    }
}
Bojan Resnik
Uh how is this the fastest algorithm? If I am understanding correctly, this is O(N). You can utilise binary search to do this in O(NlogN), which would be much faster.
Jean Azzopardi
O(NlogN) is slower than O(N). If you think about it, even if you do a binary search to find the position, you still need to move all the elements. So, you have an extra step.
Bojan Resnik
Yes. Either you're keeping the data as an array, in which case you have to move everything to make space, so binary search ends up with O(N + logN) = O(N) or you have a linked list type of structure, in which case binary search doesn't really work. Note first though that you have to test if the new element is smaller than the smallest element, in which case you should do nothing.
Brian Postow
+2  A: 

Collections.binarySearch()

ArrayList.ensureCapcity()

Your pseudocode inserts a set of new items N into sorted list A of size S and then discards the smallest item. Use Collections.binarySearch() to find the insertion point. [Read the note the performance impact if your List is does not support RandomAccess. ArrayList does support RandomAccess.]

List<Integer> l = new ArrayList<Integer>();
l.add(new Integer(2));
l.add(new Integer(3));
l.add(new Integer(6));
l.add(new Integer(9));

l.ensureCapacity(l.size()+n);

Random rand = new Random();
for (int i=0; i < n; i++) {
  final Integer newInt = Integer.rand.nextInt(1000);
  int insertPoint = Collections.binarySearch(l, newInt);
  if (insertPoint < 0)  insertPoint = -(insertPoint + 1);
  l.add(insertPoint, newInt);
}
l.remove(0);

But, are you sure you wanted to discard just 1 item? Or did you mean to insert a set of new items N into sorted list A of size S and keep just the S largest items. In that case, keep track of the min value:

int min = l.get(0);
l.ensureCapacity(l.size()+n);

Random rand = new Random();
for (int i=0; i < n; i++) {
  final Integer newInt = Integer.rand.nextInt(1000);
  if (newInt > min) {
    int insertPoint = Collections.binarySearch(l, newInt);
    if (insertPoint < 0)  insertPoint = -(insertPoint + 1);
    l.add(insertPoint, newInt);
  }
}

However, if N is large, you may be better off sorting N into an sorted array by itself, discarding the smaller of N(0) or A(0), and then merging the two sorted arrays together [left as an exercise for the reader].

If you end up using an actual array, see Arrays.binarySearch and System.arraycopy.

Bert F
A: 

ShellSort and Natural Mergesort are very performant (< O(n logn)) on largely pre-sorted data. Inserting into a sorted list with binary search needs much more time since one update requires O(n) anyway.

Alternatively, you could use heap-datastructures.

Dario
A: 

Do you really need an online one item at a time algorithm? Or are you actually parsing a larger collection of data and just want the top n items? If it's the latter, look at partial qsort.

Jason Watkins
A: 

I'm not sure if the above example would work, what is n? and if you cycle through adding random #'s from 1 to 1,000 you'll always end up with 1000, 999, 998 and 997 - no? I don't think adding a # and then resorting each time is efficient- it would probably be quicker to check each of the four positions and replacing with a higher #.

A lot depends on how many random #'s you'll add, to few # adds and check each of the 4 positions a lot of # adds just assume you get the highest in the range.

meade
+5  A: 

I'm quite surprised no-one has mentioned this yet... The data structure you are looking for is a priority queue. It is without doubt the most efficient way of accomplishing this task. A priority queue can be implemented using a number of different methods (see the linked article), but the most common is based on a binary heap. In the self-binary variety (which is quite typical), insertion and deletion both take O(log n) time.

There seems to be a built-in generic class in the Java library, PriorityQueue<E>, so it would seem you can use this directly. This type does not surprisingly seemed to be based on a heap data structure, though more specific than that I cannot say. It should be very appropiate for your use, in any case.

Noldorin
Of course, the thing to point out is that a binary heap is not going to be sorted at the end (well I guess there is a slight chance that it could be ordered, but it is unlikely). If it is sufficient to just have the N largest items, then you are okay. If it is required to be sorted at the end then a copy will have to be made, pulling each item out and inserting into a new array, taking O(Nlog(N)) additional work at the end.
Dolphin
Suvesh already mentioned this: http://stackoverflow.com/questions/985843/sort-a-sorted-array/985874#985874
erickson
Yes, indeed. However, the question doesn't appear to state that the order of the items in the end collection is of importance. Even with the additional "pulling each item out" at the end, I believe this method is still the most efficient.
Noldorin
@erickson: So it seems he did - though quite vaguely. I think the detail my answer offers (apart from giving it the exact name of a "priority queue") is worthwhile.
Noldorin
Reason for downvote, please?
Noldorin
a `PriorityQueue` does not allow elements to be removed from the end
nimcap
A: 

A key question is whether you need to know the top 4 items AFTER EVERY NEW ITEM IS GENERATED, or if you only need the top 4 after all the items are generated. Also, is it literally 4 top items, or is that just an example or illustration?

Because if you really are generating thousands of values and only want the top 4, I'd think that comparing each new value to each of the existing 4 and discarding if less than all of them would be a lot faster than doing many sorts. That's just 4 compares for each new item, rather than the potentially much larger number to do repeated sorts.

Similarly if you only need the top N at the end of the process, it may well be faster to collect them all, sort, and then take the top N. But again, if most of the values are being eliminated, sorting the relative positions of the "losers" could be a big waste of time. If we only want the top 4, then whether an item is #5 or #10,382,842 is irrelevant.

Jay
A: 

Here is another solution which merges the operations into just a search, an array copy and a value set. This avoid the need for sorting or loops.

public static <T extends Comparable<T>> 
        void insertAndRemoveSmallest(T[] array, T t) {
    int pos = Arrays.binarySearch(array, t);
    if (pos < 0) pos = ~pos;
    // this is the smallest entry so no need to add it and remove it.
    if (pos == 0) return;
    pos--;
    // move all the entries down one.
    if (pos > 0) System.arraycopy(array, 1, array, 0, pos);
    array[pos] = t;
}

This program

public static void main(String... args) {
    Integer[] ints = {2, 3, 7, 6, 9};
    System.out.println("Starting with " + Arrays.toString(ints));
    for (int i : new int[]{5, 1, 10, 8, 8}) {
        insertAndRemoveSmallest(ints, i);
        System.out.println("After adding " + i + ": " + Arrays.toString(ints));
    }
}

prints

Starting with [2, 3, 7, 6, 9]
After adding 5: [3, 5, 7, 6, 9]
After adding 1: [3, 5, 7, 6, 9]
After adding 10: [5, 7, 6, 9, 10]
After adding 8: [7, 6, 8, 9, 10]
After adding 8: [6, 8, 8, 9, 10]
Peter Lawrey