views:

861

answers:

3

I have recently come up with a situation where I need to trim some rather large log files once they grow beyond a certain size. Everything but the last 1000 lines in each file is disposed of, the job is run every half hour by cron. My solution was to simply run through the list of files, check size and trim if necessary.

for $file (@fileList) {
  if ( ((-s $file) / (1024 * 1024)) > $CSize) {
      open FH, "$file" or die "Cannot open ${file}: $!\n";
      $lineNo = 0;
      my @tLines;

      while(<FH>) {
        push @tLines, $_;
        shift @tLines if ++$lineNo < CLLimit;
      }
      close FH;

      open FH, ">$file" or die "Cannot write to ${file}: $!\n";
      print FH @tLines;
      close FH;
}

This works in the current form but there is a lot of overhead for large log files (especially the ones with 100_000+ lines) because of the need to read in each line and shift if necessary.

Is there any way I could read in just a portion of the file, e.g. in this instance I want to be able to access only the last "CLLimit" lines. Since the script is being deployed on a system that has seen better days (think Celeron 700MHz with 64MB RAM) I am looking for a quicker alternative using Perl.

+3  A: 

Estimate the average length of a line in the log - call it N bytes.

Seek backwards from the end of the file by 1000 * 1.10 * N (10% margin for error in the factor 1.10). Read forward from there, keeping just the most recent 1000 lines.


The question was asked - which function or module?

Built-in function seek looks to me like the tool to use?

Jonathan Leffler
I did think of something like this but with my rather limited knowledge of Perl I wouldn't know what modules to use to get the task done. For e.g. what function would I use to seek backwards in a file in Perl?
muteW
+7  A: 

I realize you're wanting to use Perl, but if this is a UNIX system, why not use the "tail" utility to do the trimming? You could do this in BASH with a very simple script:

if [ `stat -f "%z" "$file"` -gt "$MAX_FILE_SIZE" ]; then
    tail -1000 $file > $file.tmp
    #copy and then rm to avoid inode problems
    cp $file.tmp $file
    rm $file.tmp
fi

That being said, you would probably find this post very helpful if you're set on using Perl for this.

Jay
Thanks for the bash example, I have something similar to this for the task but am in the process of converting all my bash scripts to Perl and so needed some guidance. The Perl Monks tutorial looks promising, I'll have a look at it later.
muteW
Even with a unix system, you can get tail for other OSes. :)
brian d foy
+3  A: 

Consider simply using the logrotate utility; it is included in most modern Linux distributions. A related tool for BSD systems is called newsyslog. These tools are designed more-or-less for your intended purpose: it atomically moves a log file out of place, creates a new file (with the same name as before) to hold new log entries, instructs the program generating messages to use the new file, and then (optionally) compresses the old file. You can configure how many rotated logs to keep. Here's a potential tutorial: http://www.debian-administration.org/articles/117

It is not precisely the interface you desire (keeping a certain number of lines) but the program will likely be more robust than what you will cook up on your own; for example, the answers here do not deal with atomically moving the file and notifying the log program to use a new file so there is the risk that some log messages are lost.

Emil