views:

1053

answers:

9

I use grep to parse through my trading apps logs, but it's limited in the sense that I need to visually trawl through the output to see what happened etc.

I'm wondering if Perl is a better option? Any good resources to learn log and string parsing with Perl?

I'd also believe that Python would be good for this. Perl vs Python vs 'grep on linux'?

+5  A: 

In the end, it really depends on how much semantics you want to identify, whether your logs fit common patterns, and what you want to do with the parsed data.

If you can use regular expressions to find what you need, you have tons of options. Perl is a popular language and has very convenient native RE facilities. I personally feel a lot more comfortable with Python and find that the little added hassle for doing REs is not significant.

If you want to do something smarter than RE matching, or want to have a lot of logic, you may be more comfortable with Python or even with Java/C++/etc. For instance, it is easy to read line-by-line in Python and then apply various predicate functions and reactions to matches, which is great if you have a ruleset you would like to apply.

Uri
+3  A: 

All scripting languages are good candidates: Perl, Python, Ruby, PHP, and AWK are all fine for this. Using any one of these languages are better than peering at the logs starting from a (small) size.

Wearing Ruby Slippers to Work is an example of doing this in Ruby, written in Why's inimitable style. Here's a basic example in Perl. I suggest you choose one of these languages and start cracking.

Yuval F
A: 

I find this list invaluable when dealing with any job that requires one to parse with python.

I wouldn't use perl for parsing large/complex logs - just for the readability (the speed on perl lacks for me (big jobs) - but that's probably my perl code (I must improve)).

However if grep suits your needs perfectly for now - there really is no reason to get bogged down in writing a full blown parser. Simplest solution is usually the best, and grep is a fine tool.

Nazarius Kappertaal
+2  A: 

A big advantage Perl has over Python is that when parsing text is the ability to use regular expressions directly as part of the language syntax. For example:

if ($line =~ m/^Regex/) {
    ... code goes here
}

Perl also assigns capture groups directly to $1, $2, etc, making it very simple to work with. Depending on the format and structure of the logfiles you're trying to parse, this could prove to be quite useful (or, if it can be parsed as a fixed width file or using simpler techniques, not very useful at all).

It's all just syntactic sugar, really, and other languages also allow you use regular expressions and capture groups (indeed, the linked article shows how to do it in Python). You just have to write a bit more code and pass around objects to do it.

Adam Luchjenbroers
the ability to use regex with Perl is not a big advantage over Python, because firstly, Python has regex as well, and secondly, regex is not always the better solution.
ghostdog74
It's still simpler to use Regexes in Perl than in another language, due to the ability to use them directly. And yes, sometimes regex isn't the right solution, thats why I said 'depending on the format and structure of the logfiles you're trying to parse'
Adam Luchjenbroers
C'mon, it's not that hard to use regexes in Python. If you're arguing over mere syntax then you really aren't arguing anything worthwhile. Perl has some regex features that Python doesn't support, but most people are unlikely to need them. As for capture buffers, Python was ahead of the game with labeled captures (which Perl now has too).
brian d foy
+2  A: 

There's a Perl program called Log_Analysis that does a lot of analysis and preprocessing for you.

Carl Smotricz
+1  A: 

Another possible interpretation of your question is "Are there any tools that make log monitoring easier?", and to answer that I would suggest you have a look at Splunk or maybe Log4view.

ehnmark
Sprog is pretty nifty too: http://sprog.sourceforge.net/
daotoad
Octopussy is nice too (disclaimer: my project): http://www.8pussy.org
sebthebert
+1  A: 

on linux, you can use just the shell(bash,ksh etc) to parse log files if they are not too big in size. The other tools to go for are usually grep and awk. However, for more programming power, awk is usually used. If you have big files to parse, try awk.

Of course, Perl or Python or practically any other languages with file reading and string manipulation capabilities can be used as well.

ghostdog74
A: 

Sometimes I use to colorize a logfile to make better sense of it. Another SO question about cgrep

epatel
+1  A: 

Learning a programming language will let you take you log analysis abilities to another level.

Any dynamic or "scripting" language like Perl, Ruby or Python will do the job. What you should use really depends on external factors. Among the things you should consider:

  • does work already use a suitable langauge?
  • do you know anyone who can mentor you in a suitable language?
  • try each language a little and see which language fits you better.

Personally, for the above task I would use Perl. YMMV.

Several reasons to like Perl:

Powerful one-liners - if you need to do a real quick, one-off job, Perl offers some really great short-cuts. See perlrun -n for one example

Multi-paradigm language - Perl has support for imperative, functional and object-oriented programming methodologies.

Sigils - those leading punctuation characters on variables like $foo or @bar. They are a bit like hungarian notation without being so annoying.

Moose - an incredible new OOP system that provides powerful new OO techniques for code composition and reuse.

Strictures - the use strict pragma catches many errors that other dynamic languages gloss over at compile time. I miss it terribly when I use Python or PHP.

Self-discipline - Perl gives you the freedom to write and do what you want, when you want. This means that you have to learn to write clean code or you will hurt. Fortunately, there are tools to help a beginner. Perl::Critic does lint-like analysis of code for best practices.

daotoad