tags:

views:

347

answers:

4

I have a file in below format.

DATE Time, v1,v2,v3
05:33:25,n1,n2,n3
05:34:25,n4,n5,n5
05:35:24,n6,n7,n8
and so on upto 05:42:25.

I want calculate the values v1, v2 and v3 for every 5 min interval. I have written the below sample code.

while (<STDIN>) {
    my ($dateTime, $v1, $v2, $v3) = split /,/, $_;
    my ($date, $time) = split / /, $dateTime;
}

I can read all the values but need help to sum all the values for every 5 min interval. Can anyone please suggest me the code to add the time and values for every 5 min.

Required output

05:33 v1(sum 05:33 to 05:37) v2(sum 05:33 to 05:33) v3(sum 05:33 to 05:33)
05:38 v1(sum 05:38 to 05:42) v2(sum 05:38 to 05:42) v3(sum 05:38 to 05:42)
and so on..
A: 

Obviously, not tested much, for lack of sample data. For parsing the CSV, use either Text::CSV_XS or Text::xSV rather than the naive split below.

Note:

  • This code does not make sure the output has all consecutive five minute blocks if the input data has gaps.

  • You will have problems if there are time stamps from multiple days. In fact, if the time stamps are not in 24-hour format, you will have problems even if the data are from a single day.

With those caveats, it should still give you a starting point.

#!/usr/bin/perl

use strict;
use warnings;

my $split_re = qr/ ?, ?/;
my @header = split $split_re, scalar <DATA>;
my @data;

my $time_block = 0;

while ( my $data = <DATA> ) {
    last unless $data =~ /\S/;
    chomp $data;
    my ($ts, @vals) = split $split_re, $data;

    my ($hr, $min, $sec) = split /:/, $ts;
    my $secs = 3600*$hr + 60*$min + $sec;

    if ( $secs > $time_block + 300 ) {
        $time_block = $secs;
        push @data, [ $time_block ];
    }

    for my $i (1 .. @vals) {
        $data[-1]->[$i] += $vals[$i - 1];
    }
}

print join(', ', @header);
for my $row ( @data ) {
    my $ts = shift @$row;
    print join(', ',
        sprintf('%02d:%02d', (localtime($ts))[2,1])
        , @$row
    ), "\n";
}


__DATA__
DATE Time, v1,v2,v3
05:33:25,1,3,5
05:34:25,2,4,6
05:35:24,7,8,9
05:55:24,7,8,9
05:57:24,7,8,9

Output:

DATE Time, v1, v2, v3
05:33, 10, 15, 20
05:55, 14, 16, 18
Sinan Ünür
+1  A: 

The code is a variation the previous answer by Sinan Ünür below, except:

(1) Function timelocal will allow you to read in Day,Month,Year -- so you can sum any five minute gap.

(2) Should deal with case where final time gap is < 5 minutes.

#!/usr/bin/perl -w
use strict;
use warnings;
use Time::Local;
use POSIX qw(strftime);

my ( $start_time, $end_time, $current_time );
my ( $totV1,      $totV2,    $totV3 );          #totals in time bands

while (<DATA>) {
    my ( $hour, $min, $sec, $v1, $v2, $v3 ) =
      ( $_ =~ /(\d+)\:(\d+)\:(\d+)\,(\d+),(\d+),(\d+)/ );

    #convert time to epoch seconds
    $current_time =
      timelocal( $sec, $min, $hour, (localtime)[ 3, 4, 5 ] );    #sec,min,hr

    if ( !$end_time ) {
        $start_time = $current_time;
        $end_time   = $start_time + 5 * 60;    #plus 5 min
    }
    if ( $current_time <= $end_time ) {
        $totV1 += $v1;
        $totV2 += $v2;
        $totV3 += $v3;
    }
    else {
        print strftime( "%H:%M:%S", localtime($start_time) ),
          " $totV1,$totV2,$totV3\n";
        $start_time = $current_time;
        $end_time   = $start_time + 5 * 60;    #plus 5 min
        ( $totV1, $totV2, $totV3 ) = ( $v1, $v2, $v3 );
    }
}

#Print results of final loop (if required)
if ( $current_time <= $end_time ) {
    print strftime( "%H:%M:%S", localtime($start_time) ),
      " $totV1,$totV2,$totV3\n";
}

__DATA__
05:33:25,29,74,96
05:34:25,41,69,95
05:35:25,24,38,55
05:36:25,96,63,70
05:37:25,84,65,74
05:38:25,78,58,93
05:39:25,51,38,19
05:40:25,86,40,64
05:41:25,80,68,65
05:42:25,4,93,81

Output:

05:33:25 352,367,483
05:39:25 221,239,229
heferav
A: 

This is a good problem for Perl to solve. The hardest part is taking the value from the datetime field and identifying which 5 minute bucket it belongs to. The rest is just hashes.

my (%v1,%v2,%v3);
while (<STDIN>) {
    my ($datetime,$v1,$v2,$v3) = split /,/, $_;
    my ($date,$time) = split / /, $datetime;
    my $bucket = &get_bucket_for($time);
    $v1{$bucket} += $v1;
    $v2{$bucket} += $v2;
    $v3{$bucket} += $v3;
}
foreach my $bucket (sort keys %v1) {
    print "$bucket $v1{$bucket} $v2{$bucket} $v3{$bucket}\n";
}

Here's one way you could implement &get_bucket_for:

my $first_hhmm;
sub get_bucket_for {
    my ($time) = @_;
    my ($hh,$mm) = split /:/, $time;  # looks like seconds are not important

    # buckets are five minutes apart, but not necessarily at multiples of 5 min
    # (i.e., buckets could go 05:33,05:38,... instead of 05:30,05:35,...)
    # Use the value from the first time this function is called to decide
    # what the starting point of the buckets is.
    if (!defined $first_hhmm) {
        $first_hhmm = $hh * 60 + $mm;
    }

    my $bucket_index = int(($hh * 60 + $mm - $first_hhmm) / 5);
    my $bucket_start = $first_hhmm + 5 * $bucket_index;
    return sprintf "%02d:%02d", $bucket_start / 60, $bucket_start % 60;

}
mobrule
A: 

I'm not sure why you would use the times starting from the first time, instead of round 5 minute intervals (00 - 05, 05 - 10, etc), but this is a quick and dirty way to do it your way:

my %output;
my $last_min = -10; # -10 + 5 is less than any positive int.
while (<STDIN>) {
    my ($dt, $v1, $v2, $v3) = split(/,/, $_);
    my ($h, $m, $s) = split(/:/, $dt);
    my $ts = $m + ($h * 60);
    if (($last_min + 5) < $ts) {
        $last_min = $ts;
    }
    $output{$last_min}{1} += $v1;
    $output{$last_min}{2} += $v2;
    $output{$last_min}{3} += $v3;
}
foreach my $ts (sort {$a <=> $b} keys %output) {
    my $hour = int($ts / 60);
    my $minute = $ts % 60;
    printf("%01d:%02d v1(%i) v2(%i) v3(%i)\n", (
            $hour,
            $minute,
            $output{$ts}{1},
            $output{$ts}{2},
            $output{$ts}{3},
        ));
}

Not sure why you would do it this way, but there you go in procedural Perl, as example. If you need more on the printf formatting, go here.

Jack M.