tags:

views:

46

answers:

3

I'm working with a legacy Java app that has no logging and just prints all information to the console. Most exceptions are also "handled" by just doing a printStackTrace() call.

In a nutshell, I've just redirected the System.out and System.error streams to a log file, and now I need to parse that log file. So far all good, but I'm having problems trying to parse the log file for stack traces.

Some of the code is obscufated as well, so I need to run the stacktraces through a utility app to de-obscufate them. I'm trying to automate all of this.

The closest I've come so far is to get the initial Exception line using this:

.+Exception[^\n]+

And finding the "at ..(..)" lines using:

(\t+\Qat \E.+\s+)+

But I can't figure out how to put them together to get the full stacktrace.

Basically, the log files looks something like the following. There is no fixed structure and the lines before and after stack traces are completely random:

Modem ERROR (AT
Owner: CoreTalk
) - TIMEOUT
IN []
Try Open: COM3


javax.comm.PortInUseException: Port currently owned by CoreTalk
    at javax.comm.CommPortIdentifier.open(CommPortIdentifier.java:337)
...
    at UniPort.modemService.run(modemService.java:103)
Handling file: C:\Program Files\BackBone Technologies\CoreTalk 2006\InputXML\notify
java.io.FileNotFoundException: C:\Program Files\BackBone Technologies\CoreTalk 2006\InputXML\notify (The system cannot find the file specified)
    at java.io.FileInputStream.open(Native Method)
...
    at com.gobackbone.Store.a.a.handle(Unknown Source)
    at com.jniwrapper.win32.io.FileSystemWatcher.fireFileSystemEvent(FileSystemWatcher.java:223)
...
    at java.lang.Thread.run(Unknown Source)
Load Additional Ports
... Lots of random stuff
IN []

[Fatal Error] .xml:6:114: The entity name must immediately follow the '&' in the entity reference.
org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
...
    at com.gobackbone.Store.a.a.run(Unknown Source)
A: 

We have been using ANTLR to tackle the parsing of logfiles (in a different application area). It's not trivial but if this is a critical task for you it will be better than using regexes.

peter.murray.rust
It's not particularly critical, just something I'm doing in my free time to make it easier for us to read the log files when we need to support a client. ANTLR seems like overkill.
Riaan Cornelius
+1  A: 

Looks like you just need to paste them together (and use a newline as glue):

.+Exception[^\n]+\n(\t+\Qat \E.+\s+)+

But I would change your regex a bit:

^.+Exception[^\n]++(\s+at .++)+

This combines the whitespace between the at... lines and uses possessive quantifiers to avoid backtracking.

Tim Pietzcker
That will only find the first "at" line, not all of them.
Riaan Cornelius
Have you tried it? That's what the final `+` is for. Or can there be something between the "at" lines (are these `...` present in the actual log files)? Also, in your example text (at least as posted here) the "at" lines start with spaces, not tabs. My second regex should have handled this, though.
Tim Pietzcker
Sorry, I ddin't see the second line for some reason... Using:^.+Exception[^\n]++(\s+at .++)+I don't get any matches... What does the ++ do? Is that just shorthand for:(^.+Exception[^\n]+)+((\s+at .+)+)+
Riaan Cornelius
Actually, scratch that... I'm somewhat confused by this, but removing the leading ^ makes it work. The exception lines is definitely at the start of the line, but it works without that...
Riaan Cornelius
Occasionally, there are other lines mixed in with the stack traces, but I won't bother dealing with that. If we need to check those manually we'll deal with it. The stacktraces sometimes have leading spaces and sometimes have tabs. It looks like there are places in theapp where the stack traces are manually printed out...
Riaan Cornelius
`^` means "start of line" if you compile the regex using the `Pattern.MULTILINE` option (else it means "start of string"). A second `+` makes a quantifier "possessive", which means that unlike a single `+`, a `++` will never give up something it has matched. This avoids backtracking and thus speeds up regexes. But you can't always use it; e.g. if you made the first `+` possessive, the regex could never match.
Tim Pietzcker
A: 

I get good results using

perl -n -e 'm/(Exception)|(\tat )/ && print' /var/log/jboss4.2/debian/server.log 

It dumps all lines which have Exception or \tat in them. Since the match is in the same time the order is kept.

Peter Tillemans