I've got a set of documents which have a semi-regular format. Rows are typically separated by new line characters, and the main components of each row are separated by spaces. Some examples are a set of furniture assembly instructions, a set of table of contents, a set of recipes and a set of bank statements.
The problem is that each specimen in each set is different from its peer members in ways which make RegEx parsing infeasible: the quantity of an item may come before or after the item name, the same items may have different names between specimens, expository text or notes may exist between rows, etc.
I've used classifiers (Neural Nets, Bayesian, GA and GP) to deal with whole documents or data sets, but not to extract items from documents and classify them within a context. Can this be done? Is there a more feasible approach?