I am trying to determine the best way to parse a log file and get a count of all of the errors in it by type. Currently, I open the log in a text editor, strip out the date and thread ID, then sort the file. This puts all errors together by type, which I can then count (using the count function in the editor, not manually). I am looking for a way to do this automatically, and possibly use this as an opportunity to learn a new language (I know minimal Perl and Ruby which seem like they may work for this task). The log file looks like (the items in angle brackets are variable for each line, while the pipes are actual characters in the log):
<Datetime stamp> | <Thread ID> | ERROR | Foo.Bar: Backend error
<Datetime stamp> | <Thread ID> | ERROR | Foo.Bar: InvalidUserException
<Datetime stamp> | <Thread ID> | ERROR | Foo.Com: Timeout error
<Datetime stamp> | <Thread ID> | ALWAYS | Foo.Bar: Login Transaction [584] executed in [400] milliseconds
<Datetime stamp> | <Thread ID> | ALWAYS | Foo.Bar: Login Transaction [585] executed in [500] milliseconds
<Datetime stamp> | <Thread ID> | ALWAYS | Foo.Bar: Login Transaction [586] executed in [500] milliseconds
<Datetime stamp> | <Thread ID> | ALWAYS | Biz.Dee: Logout Transaction [958] executed in [630] milliseconds
<Datetime stamp> | <Thread ID> | ERROR | Foo.Bar: Backend error
I don't want to use a series of grep commands because I will have to know what to look for - if there is a new error in the log, without adding a new command, it won't get counted.
The output I am looking for is something like this:
Foo.Bar: Backend error: 2 occurrences
Foo.Com: Timeout error: 1 occurrence
Ideally, it would be great to also have the average transaction times calculated as well:
Foo.Bar: Login Transaction: 3 occurrences with an average of 466 milliseconds
Biz.Dee: Logout Transaction: 1 occurrence with an average of 630 milliseconds
I've seen some tools mentioned in other SO threads (SMTP log parser, Microsoft log parser, Zabbix, and Splunk), but I would also like to learn something new without unnecessary duplicating an existing tool. Would Perl or Ruby be a good choice for this task? I am not looking for a working script, but a few pointers in the right direction or a good tool to use.