tags:

views:

131

answers:

2

I have a data set with N samples (say, 13, 16, 17, 20) where each next sample is incremented by some value (3, 1, 3 in this case) and I want to find various statistics of the second sequence .

Samples are timestamps that are collected incrementally (i.e. not all samples are available at once), hence I want to use boost::accumulators::accumulator_set as looks like it's something that would fit the bill.

I want to be able to do something like this:

accumulator_set< double, features< tag::mean > > acc;
...
acc(13);
acc(16);
acc(17);
acc(20);

...BUT sampling the differences instead of the actual values.

How can I do that with accumulator_set without keeping track of the last value manually?

+2  A: 

The boost accumulators do not have a difference statistic. You could roll your own though:

http://www.boost.org/doc/libs/1_37_0/doc/html/accumulators/user_s_guide.html#accumulators.user_s_guide.the_accumulators_framework.extending_the_accumulators_framework

The best solution in my opinion is just to keep track of the last value added though.

Inverse
+1  A: 

This answer may be a bit more involved than you'd like, but at least it's not as outrageous as I was afraid it might turn out. The idea would be to start by creating an iterator type that acts as an adapter from "normal" algorithms to the Boost accumulator style of algorithms. This is the part that turned out a bit simpler than I really expected:

#ifndef ACCUM_ITERATOR_H_INCLUDED
#define ACCUM_ITERATOR_H_INCLUDED

#include <iterator>

template <class Accumulator>
class accum_iterator :
    public std::iterator<std::output_iterator_tag,void,void,void,void> {
protected:
    Accumulator &accumulator;
public:
    typedef Accumulator accumulator_type;
    explicit accum_iterator(Accumulator& x) : accumulator(x) {}

    // The only part that really does anything: handle assignment by 
    // calling the accumulator with the value.
    accum_iterator<Accumulator>&
        operator=(typename Accumulator::sample_type value) {
            accumulator(value);
            return *this;
    }
    accum_iterator<Accumulator>& operator*() { return *this; }
    accum_iterator<Accumulator>& operator++() { return *this; }
    accum_iterator<Accumulator> operator++(int) { return *this; }
};

// A convenience function to create an accum_iterator for a given accumulator.    
template <class Accumulator>
accum_iterator<Accumulator> to_accum(Accumulator &accum) { 
    return accum_iterator<Accumulator>(accum);
}

#endif

Then comes a part that's somewhat unfortunate. The standard library has an adjacent_difference algorithm that's supposed to produce the stream you want (the differences between adjacent items in a collection). It has one serious problem though: somebody thought it would be useful for it to produce a result collection that was the same size as the input collection (even though there are obviously one more input than result). To do that, adjacent_difference leaves the first item in the result with some unspecified value, so you have to ignore the first value to get anything useful from it.

To make up for that, I re-implemented an algorithm like std::adjacent_difference with one oh-so-minor difference: since there are obviously one fewer result than inputs, it only produces one fewer result than inputs, and doesn't give a meaningless, unspecified value in the result. Combining the two, we get:

#include "accum_iterator.h"
#include <iostream>
#include <vector>

#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/mean.hpp>
using namespace boost::accumulators;

// A re-implementation of std::adjacent_difference, but with sensible outputs.
template <class InIt, class OutIt>
void diffs(InIt in1, InIt in2, OutIt out) { 
    typename InIt::value_type prev = *in1;
    ++in1;
    while (in1 != in2) {
        typename InIt::value_type temp = *in1;
        *out++ = temp - prev;
        prev = temp;
        ++in1;
    }
}

int main() {
    // Create the accumulator.
    accumulator_set<double, features< tag::mean > > acc;  

    // Set up the test values.
    std::vector<double> values;
    values.push_back(13);
    values.push_back(16);
    values.push_back(17);
    values.push_back(20);

    // Use diffs to compute the differences, and feed the results to the 
    // accumulator via the accum_iterator:
    diffs(values.begin(), values.end(), to_accum(acc));

    // And print the result from the accumulator:    
    std::cout << "Mean:   " << mean(acc) << std::endl;
    return 0;
}
Jerry Coffin
I feel I should have given more info in my question. My actual problem is that I'm continuously collecting timestamp samples in a latency-sensitive system and want to measure jitter, so ideally I need to update my stats incrementally. Therefore, having a collection of samples is undesirable in this case.
Alex B
@Checkers:`diffs` simply takes input from an `input_iterator` and writes results to an `output_iterator`. For demo/testing, the `input_iterator` connects to a collection -- but it could just as easily read data from a file on disk or from a network connection.
Jerry Coffin