views:

617

answers:

2

I need to parse recipe ingredients into amount, measurement, item, and description as applicable to the line, such as 1 cup flour, the peel of 2 lemons and 1 cup packed brown sugar etc. What would be the best way of doing this? I am interested in using python for the project so I am assuming using the nltk is the best bet but I am open to other languages.

+1  A: 

Can you be more specific what your input is? If you just have input like this:

1 cup flour
2 lemon peels
1 cup packed brown sugar

It won't be too hard to parse it without using any NLP at all.

Claudiu
There are some examples above, specifically the peel of 2 lemons. It is going to be free typed text so it could be just about anything that is a valid amount and item.
Greg
if you really want to be able to handle "anything" then you need a human to do the parsing, or it's an AI-level problem. That's the nature of the beast when it comes to text parsing. Make assumptions for normal cases, and assume that edge cases will fail.
Gregg Lind
+2  A: 

This is an incomplete answer, but you're looking at writing up a free-text parser, which as you know, is non-trivial :)

Some ways to cheat, using knowledge specific to cooking:

  1. Construct lists of words for the "adjectives" and "verbs", and filter against them
    1. measurement units form a closed set, using words and abbreviations like {L., c, cup, t, dash}
    2. instructions -- cut, dice, cook, peel. Things that come after this are almost certain to be ingredients
  2. Remember that you're mostly looking for nouns, and you can take a labeled list of non-nouns (from WordNet, for example) and filter against them.

If you're more ambitious, you can look in the NLTK Book at the chapter on parsers.

Good luck! This sounds like a mostly doable project!

Gregg Lind