tags:

views:

133

answers:

3

I'm maintaining some website code that will soon dump all its errors and warnings into a log file. In order to make this a bit more pro-active I plan to parse this log file daily, summarize the warnings and errors (i.e. count the occurrence of each specific one and group by either warning/error) and then email this to the devs on the project.

This would likely admittedly be rather trivial with a hash and some further fiddling, I wondered if there is a suitable module on CPAN that I could use to do this task.

It would either be one that summarizes specifically Perl error/warnings logs or one that summarizes arbitrary text files. Any suggestions?

Edit:

The site I'm maintaining was inherited in a state where it generates 50MB of warnings per day. I'm ONLY looking for a bandaid i can apply to point out the most prolific ones. Log4Perl can come in once I run out of critical stuff to fix, however right now it's not an option.

+1  A: 

Not a CPAN module, but the loganalysis site has some very useful tools and info on log parsing and analysis.

Also, log_analysis may be worth looking at as is implemented in Perl.

bignum
+1  A: 

There isn't going to be a magical module that handles any log format, including the made up ones people use locally, that you can throw at it. Is there something about your log format? Do you have a printf-style description of it? Does it look like a widely-used format for something else?

If you get to choose the format of the error message, make it look like something that a tool you like can understand.

You might also consider using something like Log4perl. Not only can you specify any format that you like, but you can send the output anywhere you like. You can even send the output to a database, fully normalized, so that your summarizer is really just some SQL.

Update

You clarify in a comment (although you did not edit your question to clarify) that this is for warnings and errors emitted from perl. In that case, it sounds like the developers need a proper test suite to catch all of that stuff. If you're putting stuff into production with the plan of catching warnings then, you have a broken process.

brian d foy
You misunderstood a bit. The log wouldn't be arbitrary. It would simply contain the basic Perl error and warning messages; one per line. Thinking about it though Log4Perl might really be the optimum, even though it'd mean delaying this whole thing until later when i can find time to set it up. I was really only looking for something small i can drop in now and just run with it.
Mithaldu
I didn't misunderstand: you didn't specify what "errors and warnings" are. Log4perl doesn't take that long to set up, either.
brian d foy
+1 for Log4perl. I use it in a production environment to separate various warnings and errors into separate files right from the start (separated by type of error, system component of origin etc), where the regular email reports simply send out those files verbatim. It's much easier to separate the content at the source end, so that the downstream consumers don't have to then parse it back apart again and try to figure out how important each line is.
Ether
You did misunderstand. Don't try to talk your way out of this. I mentioned CPAN and Perl, which creates a context in which the combined words "errors and warnings" carries a unique and specific meaning. There was no need to clarify. Furthermore: You obviously never inherited a website that was broken from the get-go and generates warnings to the tune of 50 MB of logs per DAY. I'm only looking for a quick bandaid that'll help me to figure out which are the most prolific ones. Once I'm done with that sort of stuff i can think about dropping in something monolithic like Log4Perl.
Mithaldu
I gave you a general solution for a very vague problem statement, and my answer still holds. I know that youbare upset because you're in the middle of fighting this fire, but don't transfer that to the people trying to help you. 50MB isn't that big, and with the many, many websites I salvaged, I'd bet that the quick fix is in only a handful of places. Be careful what you assume about people that you don't know, but even be more careful about blaming them for the shortcomigs in your own question.
brian d foy
For what it's worth: I'm actually more annoyed by the tendency of this community to be unable to say "no, there is no answer for this" and to then decide what the question should have been. Also the site is very thoroughly messed up. It's a 375kb .pm file of only code with warnings happening literally everywhere.
Mithaldu
The first step is to turn off all warnings for the version you have in production. You then adjust your development process so that you catch warnings before you put them into production. For all the roadblocks you can throw at me, I'll have a way to deal with them because I do this quite a bit. And, for what it's worth, you were the only one in this thread who gave the "there is no answer for this" answer. I don't think technology is your real problem here.
brian d foy
The CPAN module i asked for is this: IN: text file, OUT: emails with mentions of unique lines and count of each, possibly categorized. Noone has mentioned anything like this and only given me solutions for different answers. This is a fact. Also, I only mentioned the scope of the project to point out that a few quick fixes won't do it. Also, you're barking up the wrong tree. I don't create new warnings with my code, since i actually take care of that and write tests too. I'm concerned with the existing bit-rot. Your solution would be: "Ignore it."?
Mithaldu
My immediate solution is to turn off warnings in production until you can fix them. If you don't want help, just go away. You're wasting your time fighting with people of Stackoverflow and rejecting most answers when it sounds like you have a lot of work to do.
brian d foy
I think a few minutes spent on clarifying myself are worth it as it may provide those reading some insight into why they get such a reaction. Either way, the last post was mostly to find out whether you knew anything about development that i didn't. Time spent on potentially learning something new is always worth it.
Mithaldu
+2  A: 

I think looking on CPAN for something as specific and simple as this might be overkill. Assuming the log file in in the default apache error format:

[Mon Apr 26 15:39:34 2010] [error] [client 69.12.220.202] syntax error at /var/www/cgi-bin/errortest.cgi line 8, near "{}"

Here's a quick one liner to mail the errors sorted by highest occurrence to an address. Easily changed to mail multiple addresses (or make an alias that goes to multiple addresses and send to that.

cat LOGFILE |
perl -ple 's/\[\w+\s\w+\s\d+\s\S+\s\d+\]\s\[[^\]]+\]\s\[[^\]]+\]\s//;' |
sort |     # Sort errors after
uniq -c |  # Uniqify with count
sort -rn | # Sort line counts
mail -s "Error list" EMAIL@ADDRESS

You can easily exclude lines by throwing in a grep -v at whatever point you find most appropriate. Throw it into cron for daily reports, or put it into a script and add to logrotate.

kbenson
It's not a CPAN module, but it's sufficiently low tech to be a proper drop-in bandaid, thanks a lot. :)
Mithaldu