ansaurus

Question

Answer 1

+3 A:

Add your numbers to a list. If size > 180, remove the first number. Counting is just iterating over the 180 elements which is probably fast enough. It's hard to beat performance wise.

Eiko 2010-10-19 07:09:10

Nice and simple :) For such small arrays being O(n) doesn't matter.

CodeInChaos 2010-10-19 08:11:30

Answer 2

A:

Let the cache be a list, so you can insert at the start and let the oldest be at the end and be removed.

Then after every insertion just scan the whole list and calculate the number you need.

Thorbjørn Ravn Andersen 2010-10-19 07:09:17

Answer 3

+1 A:

You can use a LinkedList implementation.

With this structure, you can easily manipulate the first and the last elements of the List. (addFirst, removeFirst, ...) For the algorithm (find how many numbers are lower/greater), a simple loop on the list is enough, and will give you the result in less than 100ms on a 180's element list.

Benoit Courtine 2010-10-19 07:09:48

Answer 4

+6 A:

You can keep an array of 180 numbers and save an index to the oldest one so that when a new number comes in you overwrite the number at the oldest index and increment the index modulo 180 (it's a bit more complex than that since you need special behaviour for the first 180 numbers).

As for calculating how many numbers are smaller I would use the brute force way (iterate all the numbers and count).

Edit: I find it funny to see that the "optimized" version runs five times slower than this trivial implementation (thanks to @Eiko for the analysis). I think this is due to the fact that when you use trees and maps you lose data locality and have many more memory faults (not to mention memory allocation and garbage collection).

Motti 2010-10-19 07:10:33

+1. A ring buffer beats ArrayList and LinkedList. And full iteration to get the percentile seems not too bad either.

Thilo 2010-10-19 07:14:21

But his cache should hold only 180 (+1) numbers anyway.

Eiko 2010-10-19 07:16:19

@Eiko, I don't get your point the cache holds 180 elements as described in the question and the +1 is the parameter.

Motti 2010-10-19 07:39:30

Ah, sorry... didn't catch the ring buffer. I thought you wanted to keep the age of each element.

Eiko 2010-10-19 08:00:56

@Motti, regarding your edit, I find it interesting that the problem *can* be solved in *O(log n)*. Unfortunately however, I've become hopelessly theoretical. I guess it's an occupational injury after a few years of graduate studies in formal methods :-\

aioobe 2010-10-19 12:49:11

There is nothing surprising about the fact that a theoretically better algorithm might perform slower on small datasets.

Moron 2010-10-19 17:34:18

@Moron, I agree which makes the premature optimization even worse

Motti 2010-10-19 18:32:40

@Motti: Premature optimization isn't the term I will use for aioobe's solution!

Moron 2010-10-19 18:39:23

@Moron, how come?

Motti 2010-10-19 20:47:47

@Motti: Well I would say it is a different choice of algorithm/design rather than premature optimization! Picking the right algorithm depends on a lot of factors not available to us. For instance OP might have said n=180, but is that 100% guaranteed? What systems will it run on? For instance you might benchmark your solutions all you want, but it does not really matter till OP actually does it on systems he expects it to be running on. Calling it premature optimization is just wrong and well... premature.

Moron 2010-10-19 20:58:27

Answer 5

+10 A:

For practical reasons and reasonable values of n you're best of with a ring-buffer of primitive ints (to keep track of oldest entry), and a linear scan for determining how many values are smaller than x.

In order for this to be in O(log n) you would have to use something like Guavas TreeMultiset. Here is an outline of how it would look.

class Statistics {

    private final static int N = 180;
    Queue<Integer> queue = new LinkedList<Integer>();
    SortedMap<Integer, Integer> counts = new TreeMap<Integer, Integer>();

    public int insertAndGetSmallerCount(int x) {

        queue.add(x);                                // O(1)
        counts.put(x, getCount(x) + 1);              // O(log N)

        int lessCount = 0;                           // O(N), unfortunately
        for (int i : counts.headMap(x).values())     // use Guavas TreeMultiset
            lessCount += i;                          // for O(log n)

        if (queue.size() > N) {                      // O(1)
            int oldest = queue.remove();             // O(1)
            int newCount = getCount(oldest) - 1;     // O(log N)
            if (newCount == 0)
                counts.remove(oldest);               // O(log N)
            else
                counts.put(oldest, newCount);        // O(log N)
        }

        return lessCount;
    }

    private int getCount(int x) {
        return counts.containsKey(x) ? counts.get(x) : 0;
    }

}

On my 1.8 GHz laptop, this solution performs 1,000,000 iterations on about 13 seconds (i.e. one iteration takes about 0.013 ms, well under 100 ms).

aioobe 2010-10-19 07:56:46

Since there are only 180 numbers and recalculation only occurs every 100ms I'd definitely optimize for readability and not for speed.

CodeInChaos 2010-10-19 08:09:46

+1: Almost same solution I got.

Tomas Narros 2010-10-19 08:10:05

@CodeInChaos, I don't think it would be more readable with a list. Besides, who says 180 is cast in stone? ;)

aioobe 2010-10-19 08:13:38

A time comparison to the linear approach would be cool. ;-)

Eiko 2010-10-19 08:36:03

@Eiko, yeah, especially in the form of a graph, with `n` increasing along the x-axis ;-)

aioobe 2010-10-19 08:39:00

I did some performance tests on the 180 case - inserted 1 millon entries: Queue+Tree 1380 ms, Queue alone: 1300 ms, plain int[] ring buffer: 210 ms.

Eiko 2010-10-19 09:47:10

@Eiko, now repeat the tests with larger and larger value for `n` :-)

aioobe 2010-10-19 11:02:45

Besides performance difficulties, _TreeSet_ will have a problem with repeating numbers. (Although, you could certainly use STL's multiset, sacrificing some more performance. Not sure if multiset provides ordering, though)

Nikita Rybak 2010-10-19 11:43:31

@Nikita Rybak, ah, good point!

aioobe 2010-10-19 11:45:09

I think the sorted.headSet.size is O(N) and not O(log N). The number of returned elements is random so in average N/2. counting N/2 elements is O(N).

mb14 2010-10-19 11:56:33

@Nikita, updated to a SortedMap instead.

aioobe 2010-10-19 11:59:23

@mb14, ah, good point... :-(

aioobe 2010-10-19 12:00:07

You'd better to remove zeroes from _counts_ map, if you don't want it to contain million elements after million iterfations. +1 for dedication, though :) I think you proved that whoever wants to solve this in O(logn) efficiently is better to write his own container.

Nikita Rybak 2010-10-19 12:03:54

@aioobe: Have you tried to time the vanilla version (I mean without the TreeMap). Your code could be easily modified to do so and that will be interesting to compare the time.

mb14 2010-10-19 12:06:15

@Nikita, good point again. Thought at first that it wouldn't matter, but I realize it does, if the OP has a large spread of input numbers.

aioobe 2010-10-19 12:08:33

@mp14, you mean with a ring-buffer? Not as fun to write, as it's obvious that it won't scale as well for large `N`.

aioobe 2010-10-19 12:09:00

@aioobe: no just count from the queue (and remove all the TreeMap things). O(N) I know but as N is low that might be faster (if you remove all the sorting overhead)

mb14 2010-10-19 12:11:37

@aioobe: event if it's 'obvious' , it's still interesting to see when (I mean for which value of N) the optimized version is faster. You might be surprised. (anyway I m not speak of a ring-buffer but count from a linked list)

mb14 2010-10-19 12:18:56

Answer 6

+1 A:

You could try a custom linked list data structure where each node maintains next/prev as well as sorted next/prev references. Then inserting becomes a two phase process, first always insert node at tail, and the insert sort, and the insert sort will return the count of numbers less than x. Deleting is simply removing the head.

Here is an example, NOTE: THIS IS VERY NASTY JAVA, IT IS EXAMPLE CODE TO PURELY DEMONSTRATE THE IDEA. You get the idea! ;) Also, I'm only adding a few items, but it should give you an idea of how it would work... The worst case for this is a full iteration through the sorted linked list - which is no worse than the examples above I guess?

import java.util.*;

class SortedLinkedList {

  public static class SortedLL<T>
  {
    public class SortedNode<T>
    {
      public SortedNode(T value)
      {
        _value = value;
      }

      T _value;

      SortedNode<T> prev;
      SortedNode<T> next;

      SortedNode<T> sortedPrev;
      SortedNode<T> sortedNext;
    }

    public SortedLL(Comparator comp)
    {
      _comp = comp;
      _head = new SortedNode<T>(null);
      _tail = new SortedNode<T>(null);
      // Setup the pointers
      _head.next = _tail;
      _tail.prev = _head;
      _head.sortedNext = _tail;
      _tail.sortedPrev = _head;
      _sortedHead = _head;
      _sortedTail = _tail;      
    }

    int insert(T value)
    {
      SortedNode<T> nn = new SortedNode<T>(value);

      // always add node at end
      nn.prev = _tail.prev;
      nn.prev.next = nn;
      nn.next = _tail;
      _tail.prev = nn;

      // now second insert sort through..
      int count = 0;
      SortedNode<T> ptr = _sortedHead.sortedNext;
      while(ptr.sortedNext != null)
      {
        if (_comp.compare(ptr._value, nn._value) >= 0)
        {
          break;
        }
        ++count;
        ptr = ptr.sortedNext;
      }  

      // update the sorted pointers..
      nn.sortedNext = ptr;
      nn.sortedPrev = ptr.sortedPrev;
      if (nn.sortedPrev != null)
        nn.sortedPrev.sortedNext = nn;
      ptr.sortedPrev = nn;

      return count;            
    }

    void trim()
    {
      // Remove from the head...
      if (_head.next != _tail)
      {
        // trim.
        SortedNode<T> tmp = _head.next;
        _head.next = tmp.next;
        _head.next.prev = _head;

        // Now updated the sorted list
        if (tmp.sortedPrev != null)
        {
          tmp.sortedPrev.sortedNext = tmp.sortedNext;
        }
        if (tmp.sortedNext != null)
        {
          tmp.sortedNext.sortedPrev = tmp.sortedPrev;
        }
      }
    }

    void printList()
    {
      SortedNode<T> ptr = _head.next;
      while (ptr != _tail)
      {
        System.out.println("node: v: " + ptr._value);
        ptr = ptr.next;
      }      
    }

    void printSorted()
    {
      SortedNode<T> ptr = _sortedHead.sortedNext;
      while (ptr != _sortedTail)
      {
        System.out.println("sorted: v: " + ptr._value);
        ptr = ptr.sortedNext;
      }      
    }

    Comparator _comp;

    SortedNode<T> _head;
    SortedNode<T> _tail;    

    SortedNode<T> _sortedHead;
    SortedNode<T> _sortedTail;    

  }

  public static class IntComparator implements Comparator
  {
    public int compare(Object v1, Object v2){
      Integer iv1 = (Integer)v1;
      Integer iv2 = (Integer)v2;
      return iv1.compareTo(iv2);
    }
  }


  public static void main(String[] args){

    SortedLL<Integer> ll = new SortedLL<Integer>(new IntComparator());
    System.out.println("inserting: " + ll.insert(1));
    System.out.println("inserting: " + ll.insert(3));
    System.out.println("inserting: " + ll.insert(2));
    System.out.println("inserting: " + ll.insert(5));
    System.out.println("inserting: " + ll.insert(4));
    ll.printList();
    ll.printSorted();    

    System.out.println("inserting new value");
    System.out.println("inserting: " + ll.insert(3));
    ll.trim();
    ll.printList();
    ll.printSorted();    
  }
}

Nim 2010-10-19 13:43:33

Answer 7

A:

Take a look at the commons-math implementation of the DescriptiveStatistics class (Percentile.java)

axelclk 2010-10-19 20:39:10

As far as I see this class doesn't have a function to forget the oldest value.

Christian 2010-10-19 21:55:39

In the DescriptiveStatistics class you can set a "window size". Javadoc of the addValue() method: Adds the value to the dataset. If the dataset is at the maximum size (i.e., the number of stored elements equals the currently configured windowSize), the first (oldest) element in the dataset is discarded to make room for the new value.http://commons.apache.org/math/apidocs/src-html/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html#line.150

axelclk 2010-10-21 08:32:14

Answer 8

A:

180 values is not many and a simple array which a brute force search and System.arraycopy() should be faster than 1 micro-second (1/1000 milli-second) and incurs no GC. It could be faster that playing with more complex collections.

I suggest you keep it simple and measure how long ti takes before assuming you need to optimise it.

Peter Lawrey 2010-10-19 20:59:36

ansaurus

tags:

views:

answers:

Calculating Percentiles on the fly

related questions