tags:

views:

78

answers:

3
+4  Q: 

Date Range Problem

I have a log file which has first few characters of every line as a timestamp.

2010-06-01 04:56:02,802 DEBUG {Thread-27} Some text message

2010-06-01 04:56:02,802 DEBUG {Thread-27} Some text message

2010-06-01 04:56:02,802 DEBUG {Thread-27} Some text message

2010-06-01 04:56:02,802 DEBUG {Thread-27} Some text message

2010-06-01 05:22:02,802 DEBUG {Thread-27} Some text message

2010-06-01 05:22:02,802 DEBUG {Thread-27} Some text message

2010-06-01 05:22:02,802 DEBUG {Thread-27} Some text message

2010-06-01 05:22:02,802 DEBUG {Thread-27} Some text message

2010-06-01 06:43:02,802 INFO {Thread-27} Some text message

2010-06-01 06:43:02,803 INFO {Thread-27} Some text message

2010-06-01 06:43:02,804 INFO {Thread-27} Some text message

2010-06-01 06:43:02,804 INFO {Thread-27} Some text message

2010-06-01 06:43:02,809 DEBUG {Thread-27} Some text message

2010-06-01 06:43:02,809 DEBUG {Thread-27} Some text message

2010-06-01 06:43:02,809 DEBUG {Thread-27} Some text message

2010-06-01 07:08:02,809 DEBUG {Thread-27} Some text message

2010-06-01 07:08:02,809 DEBUG {Thread-27} Some text message

My aim to find all such lines which have the timestamp of 1 hr before the current time.

How can this be achieved?

+1  A: 

Since the timestamps are going to be sorted, you can try a kind of Binary Search with a twist.

Since mostly the lines won't be of same length, you could just seek to a certain offset, look for the newlines (or whichever line terminator you have) which appears before and after (OR after and the one after that), you get a candidate line. Now compare the date on the line to the one you are looking for and decide whether to seek again, or just look around in the neighbourhood of this line.

In determing what offset to seek to next, you could try using something similar to what Interpolation Search does, i.e. decide the offset based on the difference between the time of the line you got and the time you are searching for.

This should be much faster than linear search.

For an example to do binary search in files using perl: http://perl.plover.com/yak/lightweight-db/materials/slides/slide024.html

Moron
Perl solutions tend to focus more on text-processing. Seeing that the OP's only after logging a few lines, is binary search really worth it?
Zaid
@Zaid: Does it really matter that OP is trying to use perl to do it? The OP never said that the log file was small, so I don't know where you got that from. Of course, I do agree that OP's problem could actually have been trying to determine what the log time of a particular line is, but that isn't clear from the question.
Moron
I didn't say that the log file was small. What I said was that the OP is only after a few lines. The OP need is clear: Determine those lines with timestamps that are within an hour of the current time. The question's tagged `perl`, so I'm assuming it wants an answer in Perl... I'm not saying your answer is wrong, but I've never seen such a problem tackled with a binary-search approach, probably because one would have to load the whole file into memory.
Zaid
@Zaid: Since the OP is after only a few lines, binary search is ideal! You don't want to read in a lot a lines just to get to 3 of them. And you can do binary search _without_ having to read the whole file in memory! perl supports seeking of files, doesn't it?
Moron
@Moron : I'm intrigued. Could you post some pseudo-code (or actual code) to show how you would do it?
Zaid
@Zaid: Perhaps this would give you an idea: http://perl.plover.com/yak/lightweight-db/materials/slides/slide024.html
Moron
@Moron : Good stuff! I'd post it in your answer for sake of completeness if I were you...
Zaid
@Zaid: Thanks and sure, I will add the link to my answer.
Moron
+3  A: 

The DateTime module is well-suited to the needs of this problem:

use strict;
use warnings;
use DateTime;

my $oneHourAgo = DateTime->now()->subtract( hours => 1 );
my $threshold  = join ' ', $oneHourAgo->ymd, $oneHourAgo->hms;  # Time as string

open my $logFile, '<', 'logfile.txt';

while (my $log = <$logFile>) {

    chomp $log;
    my ($time) = split /,/, $log;       # Gets current log's time

    print $log if $time ge $threshold;  # String-compares log's time to threshold
}

close $logFile;
Zaid
+1 for using `ge` rather than (more expensively) converting each timestamp to compute a `DateTime::Duration` object...
pilcrow
A: 

Does the order that the lines are output matter? If you don't mind having them with the most recent first, you might consider using File::ReadBackwards. Keep reading backwards until a line is more than one hour old, then stop. If you want them in a particular order, you could store them in an array and print them however you want. (This assumes that it's a more-or-less standard log file with the most recent entries at the end of the file.)

David Wall