ansaurus

Question

Answer 1

+3 A:

I'm very surprised that step #3 is the one giving you trouble...

Assuming you can label/categorize properly each token (and that prior to categorization you can find the proper tokens, as there may be many ambiguous cases...), the "Step #3" problem seems one that could easily be tackled with a context free grammar where each of the desired actions (such as ZIP code lookup or Mathematical expression calculation...) would be symbols with their production rule itself made of the possible token categories. To illustrate this in BNF notation, we could have something like

<SimpleMathOperation> ::= <NumericalValue><Operator><NumericalValue>

Maybe your concern is that when things get complicated, it will become difficult to express the whole requirement in terms of non-conflicting grammar rules. Or maybe your concern is that one could add rules dynamically, hence forcing the grammar "compilation" logic to be integrated with the program ? Whatever the concern, I think that this 3rd step will comparatively be trivial.

On the other hand, and unless the various categories (and underlying input text) are such that they can be described with a regular language as well (as you seem to hint in the question), a text parser and classifier (Steps #1 and #2...) is typically a less than trivial affair..

Some example Python libraries that simplify writing and evaluating grammars:

mjv 2010-02-06 04:53:30

Thanks for the pointer to pyparsing. A CFG is the way to go.

Art 2010-02-06 18:58:32

@Art, the credit for the parsing libraries goes to Max S who kindly and appropriately edited the answer. I'll try and upvote some of his own answers to "show him" ;-)

mjv 2010-02-06 19:29:24

ansaurus

tags:

views:

answers:

Building an Inference Engine in Python

related questions