views:

50

answers:

1

I am looking for research (published) on AI techniques for reading cookbook recipes. Recipes are a very limited domain that might be doable in a natural language recognition engine with some degree of accuracy.

I have in mind writing a program that would allow copy/pasting a recipe from a web browser into the AI and having it determine the title, author, ingredients, instructions, nutritional information, etc. by "reading" the recipe. I would also like to be able to process PDF files (I have a large collection), maybe also just using copy/paste.

The output will be some kind of (standard) XML-based format that can be read by a recipe organizer.

I have in mind PhD or Masters-level work.

+2  A: 

One subfield of AI that you might find relevant is information extraction.

Information extraction algorithms often work by using rules (e.g. regular expressions) to identify entities and relations in text. These rules can either be defined by hand (i.e. the Suiseki algorithm) or learned with supervised machine learning algorithms (i.e. RAPIER, Wrapper Induction, Conditional Random Fields).


For example, an information extraction algorithm might grab data from a job posting:

Job Title: Senior DBMS Consultant
Location: Dallas,TX
Responsibilities: DBMS Applications consultant works with project teams to define DBMS based solutions that support the enterprise deployment of Electronic Commerce, Sales Force Automation, and Customer Service applications.
Desired Requirements: 3-5 years exp. developing Oracle or SQL Server apps using Visual Basic, C/C++, Powerbuilder, Progress, or similar. Recent experience related to installing and configuring Oracle or SQL Server in both dev. and deployment environments.
Desired Skills: Understanding of UNIX or NT, scripting language. Know principles of structured software engineering and project management

...and distill it into this template:

title: Senior DBMS Consultant
state: TX
city: Dallas
country: US
language: Powerbuilder, Progress, C, C++, Visual Basic
platform: UNIX, NT
application: SQL Server, Oracle
area: Electronic Commerce, Customer Service
required years of experience: 3
desired years of experience: 5


Ray Mooney and his group at the University of Texas at Austin have done some great work in information extraction. Here are some references that might make good jumping-off points:

Nate Kohl