views:

262

answers:

6

I have a class that dispatches a Success and a Failure event and I need to maintain a statistic on the average number of failure/total number of events in the last X seconds from that class.

I was thinking something along the lines of using a circular linked list and append a success or failure node for each event. Then count the numbers of failure nodes vs. total nodes in the list, but this has two major drawbacks:

  1. I need to constantly scale the list size up/down to account for the "last X seconds" requirement (the number of events per second can change)
  2. I need to constantly loop over the list and count all the events (potentially expensive as I will probably have 100s of such events per second)

Does anyone know of another way to compute average values from a list of samples received in the last X seconds?

+1  A: 

keep you events in a Queue. Just append to the end and remove all the events that are too old from the front. This will at least eliminate Problem 1.

tliff
+5  A: 

You could use a queue, which would allow you to add new events to the end of the queue, and remove expired events from the start of the queue, assuming that events are added in chronological order. In Java, for example, you could use a LinkedList or an ArrayDeque, both of which implement the Queue interface.

If events are not added in chronological order, then a priority queue could be used. Elements would be ordered by their timestamps, and the highest-priority element (i.e. the next element for removal) would be the one with the smallest timestamp. In Java, this data structure is provided by PriorityQueue.

Instead of counting the events periodically, we can just keep two counters, one for the total number of events, and the other for the number of successful events. These counters will be updated whenever we add or remove events from the queue.

Zach Scrivena
a queue is enough. a deque also allows insertion in the front and removal from the back which is not needed, assuming the events are kept in order
tliff
This answer describes it the best out of all the other answers. And tliff, a deque is an efficient implementation of a queue, that's why the C++ standard library will, by default, use it to represent a queue.
Ray Hidayat
@tliff: Thanks, I've corrected my answer.
Zach Scrivena
@Ray: Thanks for the compliment =) I've expanded my answer to include some Java data structures that can be used, the ArrayDeque being one of them.
Zach Scrivena
A: 

It would be more effective to maintain two separate lists, one for the successes and one for the failures. New entries are always appended at the end of the list (i.e. it is sorted by increasing timestamps).

Now when you want to get the numer of successes/failures in the last n seconds, you create a timestamp for now() - n and work the lists. Once you find a timestamp that is greater than this value, you can eliminate all elements before the current one. The length of the list gives you the number uf successes or fails.

If you need to optimise, see if it is more effective to sort the list by decreasing timestamp (i.e. prepending new values) and work the list until you find an element that has a timestamp smaller than your comparison value. Discard this and all following members.

It is hard to say beforehand which scenario will be more effective, so you will have to try it. OTOH if it works well enough, there is no reason to optimise ;-).

Treb
+4  A: 

You should use a sampling frequency (a-la MRTG). Say you only need one second precision and to maintain the average for the past minute, you will have a fixed table of 60 entries referring to the past 60 seconds (including the present one). And also maintain the current global entry.

Each entry consists of an average value and a number of events. Every entry starts at 0 for both value.

When you receive a new event, you change the current and the global entry like that:

average = ((number * average) + 1) / (number + 1)
number = number + 1

At each sampling interval you change the global entry using the oldest entry:

global.average = ((global.number * global.average) - (oldest.number * oldest.average)) / (global.number - oldest.number)
global.number = global.number - oldest.number

And you reset the oldest entry to 0 and start using it as the current one.

kmkaplan
I don't get it - how do you make certain that only the events of the desired time interval are counted?
Treb
Treb: the table can only ever contain events for sampling_frequency*number_of_entries. Every after every sampling period you remove oldest counts. Mmm, I may need to make this more clear now that I reread.
kmkaplan
Right, I'm beginning to see the picture ;-)
Treb
Err… I meant to say it can contain events for number_of_entries/sampling_frequency secondes.
kmkaplan
+1  A: 
Pop Catalin
+1  A: 

How specific are your requirements? If you're allowed to think outside the box a little, a simple geiger-counter algorithm, a.k.a. an infinite impulse response (IIR) digital filter computes a moving "average" (depending on how you define "average"), has a minimal memory footprint and only takes a few lines of code.

Die in Sente