I need to parse recipe ingredients into amount, measurement, item, and description as applicable to the line, such as 1 cup flour, the peel of 2 lemons and 1 cup packed brown sugar etc. What would be the best way of doing this? I am interested in using python for the project so I am assuming using the nltk is the best bet but I am open to other languages.
+1
A:
Can you be more specific what your input is? If you just have input like this:
1 cup flour
2 lemon peels
1 cup packed brown sugar
It won't be too hard to parse it without using any NLP at all.
Claudiu
2008-10-15 08:22:58
There are some examples above, specifically the peel of 2 lemons. It is going to be free typed text so it could be just about anything that is a valid amount and item.
Greg
2008-10-15 15:14:52
if you really want to be able to handle "anything" then you need a human to do the parsing, or it's an AI-level problem. That's the nature of the beast when it comes to text parsing. Make assumptions for normal cases, and assume that edge cases will fail.
Gregg Lind
2008-10-24 13:56:46
+2
A:
This is an incomplete answer, but you're looking at writing up a free-text parser, which as you know, is non-trivial :)
Some ways to cheat, using knowledge specific to cooking:
- Construct lists of words for the "adjectives" and "verbs", and filter against them
- measurement units form a closed set, using words and abbreviations like {L., c, cup, t, dash}
- instructions -- cut, dice, cook, peel. Things that come after this are almost certain to be ingredients
- Remember that you're mostly looking for nouns, and you can take a labeled list of non-nouns (from WordNet, for example) and filter against them.
If you're more ambitious, you can look in the NLTK Book at the chapter on parsers.
Good luck! This sounds like a mostly doable project!
Gregg Lind
2008-10-20 14:40:57