views:

124

answers:

1

I am seeking direction and attempting to label this problem:

I am attempting to build a simple inference engine (is there a better name?) in Python which will take a string and -

1 - create a list of tokens by simply creating a list of white space separated values

2 - categorise these tokens, using regular expressions

3 - Use a higher level set of rules to make decisions based on the categorisations

Example:

"90001" - one token, matches the zipcode regex, a rule exists for a string containing just a zipcode causes a certain behaviour to occur

"30 + 14" - three tokens, regexs for numerical value and mathematical operators match, a rule exists for a numerical value followed by a mathematical operator followed by another numerical value causes a certain behaviour to occur

I'm struggling with how best to do step #3, the higher level set of rules. I'm sure that some framework must exist. Any ideas? Also, how would you characterise this problem? Rule based system, expert system, inference engine, something else?

Thanks!

+3  A: 

I'm very surprised that step #3 is the one giving you trouble...

Assuming you can label/categorize properly each token (and that prior to categorization you can find the proper tokens, as there may be many ambiguous cases...), the "Step #3" problem seems one that could easily be tackled with a context free grammar where each of the desired actions (such as ZIP code lookup or Mathematical expression calculation...) would be symbols with their production rule itself made of the possible token categories. To illustrate this in BNF notation, we could have something like

<SimpleMathOperation> ::= <NumericalValue><Operator><NumericalValue>

Maybe your concern is that when things get complicated, it will become difficult to express the whole requirement in terms of non-conflicting grammar rules. Or maybe your concern is that one could add rules dynamically, hence forcing the grammar "compilation" logic to be integrated with the program ? Whatever the concern, I think that this 3rd step will comparatively be trivial.

On the other hand, and unless the various categories (and underlying input text) are such that they can be described with a regular language as well (as you seem to hint in the question), a text parser and classifier (Steps #1 and #2...) is typically a less than trivial affair..

Some example Python libraries that simplify writing and evaluating grammars:

mjv
Thanks for the pointer to pyparsing. A CFG is the way to go.
Art
@Art, the credit for the parsing libraries goes to Max S who kindly and appropriately edited the answer. I'll try and upvote some of his own answers to "show him" ;-)
mjv