views:

39

answers:

1

// Java programmers, when I mean method, I mean a 'way to do things'...

Hello All,

I'm writing a log miner script to monitor various log files at my company, It's written in Perl though I have access to Python and if I REALLY need to, C (though my company doesn't like binary files). It needs to be able to go through the last 24 hours, take the log code and check it if we should ignore or email the appropriate people (me). The script would run as a cron job on Solaris servers. Now here is what I had in mind (this is only pseudo-ish... and badly written pesudo)

main()
{
    $today = Get_Current_Date();
    $yesterday = Subtract_One_Day($today);
    `grep $yesterday '/path/to/log' > /tmp/log`    # Get logs from previous day
    `awk '{print $X}' > /tmp/log_codes`;           # Get Log Code
    SubRoutine_to_Compare_Log_Codes('/tmp/log_codes');
}

Another thought was to load the log file into memory and read it in there... that is all fine and dandy except for a two small problems.

  1. These servers are production servers and serve a couple million customers...
  2. The Log files average 3.3GB (which are logs for about two days)

So not only would grep take a while to go through each file, but It would use up CPU and Memory in the process which need to be used elsewhere. And loading into memory a 3.3GB file is not of the wisest ideas. (At least IMHO). Now I had a crazy idea involving assembly code and memory locations but I don't know SPARC assembly sooo flush that idea.

Anyone have any suggestions?

Thanks for reading this far =)

+2  A: 

Possible solutions: 1) have the system start a new log file every midnight -- this way you could mine the finite-size log file of the previous day at a reduced priority; and 2) modify the logging system so that it automatically extracts certain messages for further processing on the fly.

Steve Emmerson
This is a step in the right direction. If log files are of unmanageable size, rolling them at specified intervals or sizes is a good idea, and is typically supported natively by the logging framework. Arguably, having a separate log for high-priority entries might also work, although it assumes that priority is known in advance.
Steven Sudit
Thank you Steve and Steven. I'll see if I can't do something in that nature, should be fairly simple. Also, we don't really have a separate log for high-priority , this is where I come in with this script, to process what are high-priority logs (with a conf file where people can add the log codes themselves to either monitor or ignore the specific log). Again thanks guys, I'll take a look in that direction and get back to you guys.
w3b_wizzard