tags:

views:

312

answers:

4

I have quite an interesting task at work - I need to find out how much time user spent doing something and all I have is timestamps of his savings. I know for a fact that user saves after each small portion of a work, so they is not far apart.

The obvious solution would be to find out how much time one small item could possibly take and then just go through sorted timestamps and if the difference between current one and previous one is more than that, it means user had a coffee break, and if it's less, we can just add up this difference into total sum. Simple example code to illustrate that:

var prev_timestamp = null;
var total_time = 0;
foreach (timestamp in timestamps) {
    if (prev_timestamp != null) {
         var diff = timestamp - prev_timestamp;
         if (diff < threshold) {
              total_time += diff;
         }
    }
    prev_timestamp = timestamp;
}

The problem is, while I know about how much time is spent on one small portion, I don't want to depend on it. What if some user just that much slower than my predictions, I don't want him to be left without paycheck. So I was thinking, could there be some clever math solution to this problem, that could work without knowledge of what time interval is acceptable?

PS. Sorry for misunderstanding, of course no one would pay people based on this numbers and even if they would, they understand that it is just an approximation. But I'd like to find a solution that would emit numbers as close to real life as possible.

+5  A: 

You could get the median TimeSpan, and then discard those TimeSpans which are off by, say >50%.

But this algorithm should IMHO only be used to get estimated spent hours per project, not for payrolls.

Groo
I agree with you Groo I definitively wouldn't want to work on that place... ;)
Paulo Santos
This is obvious too, I really like something with math that could calculate it in one go.
vava
+1  A: 

Grab all periods and look at the average? If some are far outside the average span you could discard them or use an adjusted value for them in the average.

I agree with Groo that using something based only on the 'save' timestamp is NOT what you should do - it will NEVER provide you with the actual time spent on the tasks.

Fake51
+1  A: 

The clever math you seek is called "standard deviation".

Christopher Morley
+2  A: 

You need to either look at the standard deviation for the group of all users or the variance in the intervals for a single user or better a combination of the two for your sample set.

SpaceghostAli