tags:

views:

1101

answers:

5

Hi. I'm trying to use python to parse a log file and match 4 pieces of information in one regex. (epoch time, SERVICE NOTIFICATION, hostname and CRITICAL) I can't seem to get this to work. So Far I've been able to only match two of the four. Is it possible to do this? Below is an example of a string from the log file and the code I've gotten to work thus far. Any help would make me a happy noob.

[1242248375] SERVICE ALERT: myhostname.com;DNS: Recursive;CRITICAL;SOFT;1;CRITICAL - Plugin timed out while executing system call

hostname = options.hostname

n = open('/var/tmp/nagios.log', 'r')
n.readline()
l = [str(x) for x in n]
for line in l:
    match = re.match (r'^\[(\d+)\] SERVICE NOTIFICATION: ', line)
    if match:
       timestamp = int(match.groups()[0])
       print timestamp
A: 

You can use | to match any one of various possible things, and re.findall to get all non-overlapping matches to some RE.

Alex Martelli
A: 

Could it be as simple as "SERVICE NOTIFICATION" in your pattern doesn't match "SERVICE ALERT" in your example?

Oddthinking
+1  A: 

The question is a bit confusing. But you don't need to do everything with regular expressions, there are some good plain old string functions you might want to try, like 'split'.

This version will also refrain from loading the entire file in memory at once, and it will close the file even when an exception is thrown.

regexp = re.compile(r'\[(\d+)\] SERVICE NOTIFICATION: (.+)')
with open('var/tmp/nagios.log', 'r') as file:
    for line in file:
        fields = line.split(';')
        match = regexp.match(fields[0])
        if match:
            timestamp = int(match.group(1))
            hostname = match.group(2)
Dietrich Epp
+1  A: 

If you are looking to split out those particular parts of the line then.

Something along the lines of:

match = re.match(r'^\[(\d+)\] (.*?): (.*?);.*?;(.*?);',line)

Should give each of those parts in their respective index in groups.

+2  A: 

You can use more than one group at a time, e.g.:

import re

logstring = '[1242248375] SERVICE ALERT: myhostname.com;DNS: Recursive;CRITICAL;SOFT;1;CRITICAL - Plugin timed out while executing system call'
exp = re.compile('^\[(\d+)\] ([A-Z ]+): ([A-Za-z0-9.\-]+);[^;]+;([A-Z]+);')
m = exp.search(logstring)

for s in m.groups():
    print s
Mike Kale
Just FYI, exp.match(logstring) works just as well in this example. I.e., the solution ISN'T use search() instead of match().
Jon-Eric
Sure, good point. I'm in the habit of using search instead of match, but since we're starting at the beginning of the string it's the same thing. The key is adding four different grouping parens to grab the four things the OP wants.
Mike Kale