views:

479

answers:

9

I have the following algorithm which scan a large circular array (data). At certain point in the array, I need to take a look at the past values (0 = newest data point, n = oldest data point) and determine if there was a value 5% below the current value. I ended up writing a O(n^2) algorithm which works okay, but this doesn't scale.

        const int numberOfDataPointsInPast = 1000;
        int numberOfDataPoints = 0;
        for (int i = numberOfDataPointsInPast; i >= 0; i--)
        {
            double targetPoint = data[i] * 0.95;
            for (int j = i + numberOfDataPointsInPast; j > i; j--)
            {
                if (data[j] <= targetPoint)
                {
                    numberOfDataPoints++;
                    break;
                }
            }
        }

Any idea how I could transform this into a O(n) algo? Thanks!

+6  A: 

While iterating the array store the lowest value. This requires to create a min variable and perform a compare check in every step. Instead of comparing all previous values with the new one, compare it only with the lowest.

kgiannakakis
I think you need to elaborate a little more. I don't understand how you create a O(n) algorithm by doing this; it seems that you just defer the search until an insertion is performed. I the worst-case (which is what O measures), you are still O(n^2).
San Jacinto
Depends on whether you can rely on the minimum always being in the window or not.
Donal Fellows
I think this is the correct answer.. There is no search required anymore since any element lower than 95% is fine..
tomdemuyt
@tomdemuyt ...but only if you never "lose" values (i.e. no deletions, no overwrites to save memory, no expiration of values based on time, etc.).
San Jacinto
@San Jacinto: I believe that the proposed algorithm successfully converts the code block posted to O(n). This assumes that the data do not change during the analysis.
kgiannakakis
@San Jacinto: No algorythm can address that without locking, not sure what your point is ?
tomdemuyt
@tomdemuyt Why is locking necessary? OP didn't mention a need for concurrency. If concurrency is necessary, then the nature of the way he's storing the values implies locking on a shared circular buffer. With due respect, I don't understand your point. My point was simply that, if it's desired, your algorithm precludes the use of deletes for any reason.
San Jacinto
@kgiannakakis my apologies. It appears a misunderstood the problem slightly, so I can see why it was thought I was implying locking. Now that I follow the problem, yours seems most concise and correct, provided that you mean to iterate over the list once, and THEN perform your search. +1.
San Jacinto
This does not take into account the 'window' of 1500 previous values.
Adrian Regan
@Adrian and the OP hasn't stated that it's required. It's a little ambiguous, and it's a point raised by others. However, surely you can see how a small tweak would account for this?
San Jacinto
He has stated it, in his code example 'numberOfDataPointsInThePast'
Adrian Regan
This is not going to work. It is not taking into consideration the window of 1500 points.
Martin
@Martin @Adrian he hasn't made it explicitly clear. Granted, there's a strong chance he may have meant that (if you see my earlier comments, i was assuming the same), but it isn't outright stated.
San Jacinto
@San Jacinto: Martin is the poster of the question! So I would say he has made it perfectly clear :-) Of course, you can argue that the content of the question is what counts :D
Moron
@Moron Haha, good point. Sorry, Martin! I could have been worded a little better to remove all ambiguity :)
San Jacinto
+2  A: 

EDIT:

After thinking about it somemore, an easy O(n) time algorithm is possible, without the need for RMQ or tree (see previous portion of my answer below).

Given an array A[1...n] and and a window width W, you need to find minimum A[i, ...i+W], given i.

For this, you do the following.

Split A[1...n] into contiguous blocks of size W-1. B1, B2, ...B(W-1).

For each block B, maintain two more block called BStart and BEnd.

BStart[i] = minimum of B1, B[2], ..., B[i].

BEnd[i] = minimum of B[W-1], B[W-2], ..., B[W-i].

This can be done in O(W) time for each block, and so O(n) time total.

Now given an i, the sub-array A[i...i+W] will span two consecutive blocks, say B1 and B2.

Find the minimum from i to end of block B1, and start of block B2 to i+w using B1End and B2Start respectively.

This is O(1) time, so total O(n).

For a circular array C[1....n], all you need to do is run the above on

A[1....2n], which is basically two copies of C concatenated together i.e. A[1...n] = C[1...n] and A[n+1 ... 2n] = C[1...n]


Previous writeup.

Ok. Assuming that I have understood your question correctly this time...

It is possible in O(n) time and O(n) space.

In fact it is possible to change your window size to any number you like, have it different for different elements and still have it work!

Given an array A[1...n], it can be preprocessed in O(n) time and O(n) space to answer queries of the form: What is the position of a minimum element in the sub-array A[i...j]? in constant time!

This is called the Range Minimum Query Problem.

So theoretically, it is possible to do in O(n) time.

Just using a tree will give you O(nlogW) time, where W is the size of the window and will probably work much better than RMQ, in practice, as I expect the hidden constants might make the RMQ worse.

You can use a tree as follows.

Start backwards and insert W elements. Find the minimum and push onto a stack. Now delete the first element and insert (W+1)th element. Find the minimum, push on the stack.

Continue this way. Total processing time will be O(nlogW).

At the end you have a stack of minimums, which you can just keep popping off as you now walk the array a second time, this time in the correct order, searching for 0.95*target.

Also, your question is not really clear, you say it is a circular buffer, but you don't seem to be doing a modulus operation with the length. And as coded up, your algorithm is O(n), not O(n^2) as your window size is a constant.

Moron
@Martin: Does this answer work for you? I believe it is O(n)...
Moron
Emmm… and when you stumble upon on a very small element—say zero—it will be linked as a list head and will stay there forever even after it will go out of `numberOfDataPointsInPast`. Am I misunderstand something?
nailxx
@nailxx: You only need to know if there was some previous element <= 0.95*newValue. So having a zero is great! isn't it? In any case, you can create the linkedlist on the _fly_ in O(n) time, just before you start searching, by walking the array backwards (in the reverse order of the order in the question).
Moron
@nailxx: The size of the linkedlist will be exactly same as the number of elements (i.e. we dupicate values), if that is what is the confusion.
Moron
@nailxx: I misunderstood the question!
Moron
@nailxx, @Martin: I have now changed my answer based on what I understand the question to be now. Let me know if it works...
Moron
@Martin: I have simplified the algorithm and it is now a simple O(n) time and O(n) space algorithm.
Moron
@Martin: People have put some effort trying to answer your questions. Least you can do is acknowledge. Anyway... good luck.
Moron
+2  A: 

I do not think it is possible to do so in O(n) because by solving it with O(n) you can sort it with O(n) and that is not possible. (minimum, for sort is O(nlogn)).

EDIT - reduce sorting to this problem

Suppose one can tell for each point how many points in the past has value smaller than x% (here x is 5 - but x can also be 0 then the count will be any smaller points in the past).

Now - suppose you want to sort an array of n elements.
If you can get the number of smaller points int the past for all elements in O(n) if point a has a greater value than point b the count for point a will also be greater that the count for point b (because the array is circular). So this problem actually yield a function from the values to the count that preserves the order.
Now - the new values are bound between o and n and this can be sorted in time n.

Correct me if I am wrong (It might be that I did not understand the problem in the first place).

Itay
O(n) is possible, IMO. See my answer.
Moron
Sorry I have to give this a -1: THere is 0.95 involved and O(n) is possible and this has got nothing to do with sorting. You need to find if there is some value <= 0.95*newValue. You don't have to find the exact postition, which sorting requires.
Moron
I will edit my answer to explain the reduction to sorting.
Itay
@Itay: First, you have to reduce it the other way: Sorting must me reduced to this, not the other way round. Second, O(n) is possible! Did you read my answer?
Moron
But you are assuming that you must sort the data to accomplish the task. This isn't necessarily the case.
San Jacinto
@Moron - I did - but i did not understand what you meant be "only inserting", "deleting" what all that had to do with the problem?
Itay
@San Jacinto - I did not assume that I must sort. What I said is that if I want to sort I can do it with O(n) + O(solving this problem).
Itay
@Itay what did you mean when you said: "I do not think it is possible to do so in O(n) because by solving it with O(n) you can sort it with O(n) and that is not possible. (minimum, for sort is O(nlogn))." Your very point is that this isn't possible because you can't sort the data in O(n) time. What I am saying is that you say it isn't possible because____ but your assumption is wrong. You don't need to sort, so therefore it may be possible with another method.
San Jacinto
@Itay: There is no 'find the number of elements'. The question asks: _is there at least one?_...
Moron
@San Jacinto - I did not assume that I need to sort.
Itay
@Itay: The comment about inserting was irrelevant. Forget it. What is more relevant is that an O(n) solution is possible and you seem to be ignoring it :-)
Moron
@Moron - Indeed if the problem is to find f there is at least one I do agree with you :)
Itay
@Itay Ok, my apologies. I intend to think people mean what they say/write. I misunderstood what you meant somehow.
San Jacinto
Since I have misunderstood the question myself, I have rescinded the downvote.
Moron
+1  A: 

You could take the first numberOfDataPointsInPast in the past sort them, which is n*log(n). Then do a binary search, log(n), find the lowest data point that passes the 5% test. That will tell you how many points out of the numberOfDataPointsInPast will pass the test in n*log(n) time I believe.

Joshua
+2  A: 

You could maintain an array buffArray for numberOfDataPointsInPast elements that will contain current „window” elements sorted in ascending order.

For each iteration:

  • Check if current element is lower than 0.95 * buffArray[0] and perform necessary actions if it is.
  • Remove element that goes out of „window” (i.e. i+numberOfDataPointsInPast’th) from buffArray.
  • Add new element (i.e. i’th) to buffArray maintaining sort order.

This is not O(N) as I understand, but definitely more effective than O(N^2) since adding and removing elements to / from sorted array is O(log N). I suspect that final efficiency is O(N log(W)), where W is numberOfDataPointsInPast.

nailxx
darn, you finished posting first.
Jason S
Your post is more clean however :) Let it be to choose from :)
nailxx
It is worst case O(NW) if you use an array. O(N logW) is inaccurate.
Moron
finding elements in a sorted array is O(log N), adding or removing is O(N)
jk
+2  A: 

I think I understand your requirements.... I'm going to restate the problem:

Given: a sliding buffer size K, and a data array of size N > K, indices from 0 to N-1.

Compute: Count the number of points j such K <= j < N-1, and that the set {data[j-1], data[j-2], data[j-3], ... data[j-K]} contains at least one point that has value of <= 0.95 * data[j].

This can be accomplished as follows:

  1. Sort the points {data[0], data[1], ... data[K-1]} using a data structure which has at most O(log N) cost for insertion/removal.

  2. Initialize a counter R to 0, initialize j to K.

  3. Check the sorted array to see if the lowest point is <= data[j] * 0.95; if so, increment R.

  4. Remove data[j-K] from the sorted array, and insert data[j] to the sorted array.

  5. Increment j

  6. If j < N, go back to step 3.

The key here is to choose the proper data structure. I am pretty sure a binary tree would work. If the incremental insertion cost is O(log N) then your total runtime is O(N log N).

Jason S
A: 

The iterations need to start from the bottom and increment (keeping the min of the past). Right now, as posted the algorithm is always looking back, instead of moving forward and remembering the past minimum.

As new points are added, the range of data points can only increase the upper or lower bound. As the lower bound decreases, keeping the lower bound is all that it necessary. Any new points which are more than the lower bound / 0.95 will be acceptable (since the lower bound is always in the past):

const int numberOfDataPointsInPast = 1000; 
int numberOfDataPoints = 0; 
double lb = NAN;
for (int i = 0; i < numberOfDataPointsInPast; i++) 
{ 
    if ( lb == NAN || data[i] < lb ) {
        lb = data[i];
    }
    if ( data[i] >= lb / 0.95 ) {
        numberOfDataPoints++
    }
} 
Cade Roux
This is the same solution presented by another poster.
San Jacinto
A: 

Try this:

Always maintain two pointers to elements within your buffer. One is the minimum value encountered and the other is the next mimumum (that is the next hightest by increment). Remember these are pointers to the buffer.

At each step in your progression through the buffer, determine if the current value is less than or equal to the value pointed to by min1 or min2, if so, update min1 or min2 to point to the current location. Otherwise if by pointer arithmetic, the value of min1 or min2 is 1500 places back in the buffer, you need to determine which one it is and re-adjust min1 or min2 accordingly, that is min1 points to min2 and min2 is set to point at current location, or min2 is simply set to point to current location.

Whether the value pointed to by either min1 or min2 is less than 15% of the current value can then be determined by simple comparison...

Adrian Regan
A: 

You have two option:

  1. Sort it -- O(n log n)

  2. Median of Medians algorithm

J-16 SDiZ