views:

80

answers:

2

I have two fields in my text file which are

timestamp  number

The format of timestamp is hh:mm:ss.mmm

some sample records are

18:31:48.345 0.00345

18:31:49.153 0.00123

18.32:23.399 0.33456

I want to print out averages of records which are no more than 30 second apart. what is a good and fast way of doing it

+1  A: 

Here is a starting point in awk. I know you can optimize code better.

count == 0 { startTime = timeToSeconds($1) }
{   currentTime = timeToSeconds($1)
    elapsedTime = currentTime - startTime
    if (elapsedTime > 30.0) {
        calculateAverage()
        startTime = timeToSeconds($1)
    }
    print
    sum += $2
    count++
}
END { calculateAverage() }
function timeToSeconds(timeString) {
    # Convert a time string to number of seconds
    split(timeString, tokens, ":")
    seconds = tokens[1]*3600.0 + tokens[2]*60.0 + tokens[3]
    return seconds
}
function calculateAverage() {
    # Use & modify global vars: count, sum
    average = sum / count
    printf "Average: %.4g\n\n", average
    sum = 0.0; count = 0
}
Hai Vu
A: 

I would start by using some scripting language that has built-in date/time 'operations'. For instance, in Ruby you could easily do:

require 'time'

t,n = gets.chomp.split(/\s+/)
ts1 = Time.parse(t)

# ...

t,n = gets.chomp.split(/\s+/)
ts2 = Time.parse(t)

Which now allows you to do things like:

diff = ts2 - ts1
if diff > 30
   # difference is greater than 30 seconds
end

Ruby Time objects can be used in context (float, int, String, etc) so it is trivial to start doing calculations as if the parsed dates are actually numeric values.

ezpz