views:

2740

answers:

6

I'm writing a log collection / analysis application in Python and I need to write a "rules engine" to match and act on log messages.

It needs to feature:

  • Regular expression matching for the message itself
  • Arithmetic comparisons for message severity/priority
  • Boolean operators

I envision An example rule would probably be something like:

(message ~ "program\\[\d+\\]: message" and severity >= high) or (severity >= critical)

I'm thinking about using PyParsing or similar to actually parse the rules and construct the parse tree.

The current (not yet implemented) design I have in mind is to have classes for each rule type, and construct and chain them together according to the parse tree. Then each rule would have a "matches" method that could take a message object return whether or not it matches the rule.

Very quickly, something like:

class RegexRule(Rule):
    def __init__(self, regex):
         self.regex = regex

    def match(self, message):
         return self.regex.match(message.contents)

class SeverityRule(Rule):
    def __init__(self, operator, severity):
         self.operator = operator

    def match(self, message):
         if operator == ">=":
             return message.severity >= severity
         # more conditions here...

class BooleanAndRule(Rule):
    def __init__(self, rule1, rule2):
         self.rule1 = rule1
         self.rule2 = rule2

    def match(self, message):
          return self.rule1.match(message) and self.rule2.match(message)

These rule classes would then be chained together according to the parse tree of the message, and the match() method called on the top rule, which would cascade down until all the rules were evaluated.

I'm just wondering if this is a reasonable approach, or if my design and ideas are way totally out of whack? Unfortunately I never had the chance to take a compiler design course or anything like that in Unviersity so I'm pretty much coming up with this stuff of my own accord.

Could someone with some experience in these kinds of things please chime in and evaluate the idea?

EDIT: Some good answers so far, here's a bit of clarification.

The aim of the program is to collect log messages from servers on the network and store them in the database. Apart from the collection of log messages, the collector will define a set of rules that will either match or ignore messages depending on the conditions and flag an alert if necessary.

I can't see the rules being of more than a moderate complexity, and they will be applied in a chain (list) until either a matching alert or ignore rule is hit. However, this part isn't quite as relevant to the question.

As far the syntax being close to Python syntax, yes that is true, however I think it would be difficult to filter the Python down to the point where the user couldn't inadvertently do some crazy stuff with the rules that was not intended.

+3  A: 

See this question:
http://stackoverflow.com/questions/318888/solving-who-owns-the-zebra-programmatically

Specifically, the current accepted answer has a really great example of using the constraint module as a rules engine.

Joel Coehoorn
A: 

The only place that's different from Python syntax itself is the message ~ "program\\[\d+\\]: message" part, so I wonder if you really need a new syntax.

Update: OK, you have either usability or safety concerns -- that's reasonable. A couple suggestions:

  • Take a hint from Awk and streamline the pattern-matching syntax, e.g. /program\[\d+\]: message/ instead of message ~ "program\\[\d+\\]: message".

  • I'd implement it by translating to a Python expression as the input is parsed, instead of building a tree of objects to evaluate, unless you expect to be doing more operations on these things than evaluation. This should need less code and run faster. The top level might go something like:

    def compile(input):
        return eval('lambda message, severity: %s' % parse(input))
    

Another idea, further afield: write your app in Lua. It's designed for nonprogrammers to extend programs reasonably safely without needing to learn a lot. (It's been used that way successfully, and you can sandbox evaluation so user's code can't get at any capabilities you don't pass to it explicitly, I'm told.)

I'll shut up now. :-) Good luck!

Darius Bacon
A: 

It's a little hard to answer the question without knowing what the scope of the application is.

  • What are you trying to reason about?
  • What level of analysis are you talking about?
  • How complicated do you see the rules becoming?
  • How complicated is the interplay between different rules?

On one end of the spectrum is a simple one-off approach like you have proposed. This is fine if the rules are few, relatively simple, and you're not doing anything more complicated than aggregating log messages that match the specified rules.

On the other side of the spectrum is a heavy-weight reasoning system, something like CLIPS that has a Python interface. This is a real rules engine with inferencing and offers the ability to do sophisticated reasoning. If you're building something like a diagnostic engine that operates off of a program log this could be more appropriate.

EDIT:

I would say that the current implementation idea is fine for what you're doing. Anything much more and I think you probably run the risk of over-engineering the solution. It seems to capture what you want, matching on the log messages just based on a few different criteria.

+16  A: 

Do not invent yet another rules language.

Either use Python or use some other existing, already debugged and working language like BPEL.

Just write your rules in Python, import them and execute them. Life is simpler, far easier to debug, and you've actually solved the actual log-reading problem without creating another problem.

Imagine this scenario. Your program breaks. It's now either the rule parsing, the rule execution, or the rule itself. You must debug all three. If you wrote the rule in Python, it would be the rule, and that would be that.

"I think it would be difficult to filter the Python down to the point where the user couldn't inadvertently do some crazy stuff with the rules that was not intended."

This is largely the "I want to write a compiler" argument.

1) You're the primary user. You'll write, debug and maintain the rules. Are there really armies of crazy programmers who will be doing crazy things? Really? If there is any potential crazy user, talk to them. Teach Them. Don't fight against them by inventing a new language (which you will then have to maintain and debug forever.)

2) It's just log processing. There's no real cost to the craziness. No one is going to subvert the world economic system with faulty log handling. Don't make a small task with a few dozen lines of Python onto a 1000 line interpreter to interpret a few dozen lines of some rule language. Just write the few dozen lines of Python.

Just write it in Python as quickly and clearly as you can and move on to the next project.

S.Lott
I know it's been a while, but I had to think about this solution for quite some time before I was convinced. However, after reading your recent blog post that referenced this question and using Zenoss, which does exactly what you propose, I have to say that I'm sold.
Kamil Kisiel
For completeness, here's the URL to the blog post:http://homepage.mac.com/s_lott/iblog/architecture/C412398194/E20090220060355/index.html
Kamil Kisiel
+1 -- domain specific languages tend to look like Python without the parens anyway (i.e. http://twill.idyll.org/ ).
cdleary
A: 

You might also want to look at PyKE.

+1  A: 

Or even pychinko, a Rete-based RDF friendly rule engine.