tags:

views:

38

answers:

5

I have a list of timestamps in a text file. I want to figure out the times at which the change is more than a given threshold.

Input format:

10:13:55
10:14:00
10:14:01
10:14:02
10:14:41
10:14:46
10:17:58
10:18:00
10:19:10
10:19:16

If the threshold is, say, 30 seconds, I want the output to list the cases where the change is >= 30 seconds

eg. 10:14:02 and 10:14:41, 10:14:46 and 10:17:58

Solutions in bash, python or ruby would be helpful. Thanks.

A: 

Ruby:

File.open(filename,'r').each do |line|

times = split

times.each { |time| time = Time.parse(time) }
times.each_with_index do |time,i|


 puts time if ((time[i+1] - time [i]).sec > 30)

end

end
ennuikiller
+1  A: 

Python:

from datetime import datetime

list = open("times.txt").read()
lasttime = None

for timestamp in [datetime.strptime(datestring, "%H:%M:%S") for datestring in list.split()]:
    if lasttime and (timestamp - lasttime).seconds > 30:
        print lasttime.time(),"and",timestamp.time()

    lasttime = timestamp
Dave Webb
A: 

In Python:

data = open('filename').read()
times = [datetime.time(x) for x in data.split()]

for i in range(1, len(times)):
  if times[i] - times[i-1] > datetime.timedelta(seconds=30):
    print times[i], times[i-1]
Nick Johnson
+2  A: 

I tend to use awk (with a sed filter to break your lines up) for things like that:

echo '10:13:55 10:14:00 10:14:01 10:14:02
      10:14:41 10:14:46 10:17:58 10:18:00
      10:19:10 10:19:16'
| sed -e 's/  *//g' -e 's/^ //' -e 's/ $//' -e 's/ /\n/g'
| awk -F: '
    NR==1 {s=$0;s1=$1*3600+$2*60+$3}
    NR>1 {t1=$1*3600+$2*60+$3;if (t1-s1 > 30) print s" "$0;s1=t1;s=$0}
    '

outputs:

10:14:02 10:14:41
10:14:46 10:17:58
10:18:00 10:19:10

Here's how it works:

  1. It sets the field separator to : for easy extraction.
  2. When the record number is 1 (NR==1), it simply stores the time (s=$0) and number of seconds since midnight (s1=$1*3600+$2*60+$3). This is the first baseline.
  3. Otherwise (NR>1), it gets the seconds since midnight (t1=$1*3600+$2*60+$3) and, if that's more than 30 seconds since the last one, it outputs the last time and this time (if (t1-s1 > 30) print s" "$0).
  4. Then it resets the baseline for the next line (s1=t1;s=$0).

Keep in mind the sed command is probably more complicated that it needs to be in this example - it collapses all space sequences to one space, removes them from the start and end of lines then converts newline characters into spaces. Depending on the input form of your data (mine is complicated since it's formatted for readability), this may not all be necessary.

Update: Since the question edit has stated that the input is one time per line, you don't need the sed part at all.

paxdiablo
I tend to use `tr ' ' '\n'` for splitting space-separated fields on multiple lines
mouviciel
@pax, pls check: change the last entry to 10:19:55 for example, and run your script again.
ghostdog74
@ghostdog74, I did that and got the extra line "10:19:10 10:19:55" - that seems right to me. Did you expect something else?
paxdiablo
@pax..my bad. something to do with the getting rid of those "newlines" after i copy and paste from here....anyway, all is good.
ghostdog74
No probs, I thought I was missing something incredibly obvious. It wouldn't be the first time :-) Cheers.
paxdiablo
A: 

@OP, you algorithm is just to find a way to iterate each field, converting them to secs, and compare against the neighbours.

gawk 'BEGIN{threshold=30}
{
 for(i=1;i<=NF;i++){
    m=split($i,t,":")
    n=split($(i+1),w,":")
    sec = (t[1]*3600) + (t[2]*60) + t[3]
    sec_next = (w[1]*3600) + (w[2]*60) + w[3]
    if ( (sec_next - sec) > threshold ){
        print $i, $(i+1)
    }
 }
}' file

output:

# ./shell.sh
10:14:02 10:14:41
10:14:46 10:17:58
10:18:00 10:19:10
ghostdog74