views:

131

answers:

2

Hello,

assuming that I know nothing about everything and that I'm starting in programming TODAY what do you say would be necessary for me to learn in order to start working with Natural Language Processing?

I've been struggling with some string parsing methods but so far it is just annoying me and making me create ugly code. I'm looking for some fresh new ideas on how to create a Remember The Milk API like to parse user's input in order to provide an input form for fast data entry that are not based on fields but in simple one line phrases instead.

EDIT: RTM is todo list system. So in order to enter a task you don't need to type in each field to fill values (task name, due date, location, etc). You can simply type in a phrase like "Dentist appointment monday at 2PM in WhateverPlace" and it will parse it and fill all fields for you.

I don't have any kind of technical constraints since it's going to be a personal project but I'm more familiar with .NET world. Actually, I'm not sure this is a matter of language but if it's necessary I'm more than willing to learn a new language to do it.

My project is related to personal finances so the phrases are more like "Spent 10USD on Coffee last night with my girlfriend" and it would fill location, amount of $$$, tags and other stuff.

Thanks a lot for any kind of directions that you might give me!

+1  A: 

Have a look at NLTK, its a good resource for beginner programmers interested in NLP. http://www.nltk.org/
It is written in python which is one of the easier programming languages.

Now that I understand your problem, here is my solution:

You can develop a kind of restricted vocabulary, in which all amounts must end witha $ sign or any time must be in form of 00:00 and/or end with AM/PM, regarding detecting items, you can use list of objects from ontology such as Open Cyc. Open Cyc can provide you with list of all objects such beer, coffee, bread and milk etc. this will help you to detect objects in the short phrase. Still it would be a very fuzzy approach.

Akshay Bhat
Two simple side-questions. Do u think that I need a full NLP approach to do something like that? Is this resource also valid for non-english languages?
tucaz
depends on how much deep parsing you want to achieve. Parsing free form text, even short phrases is really difficult and you would need some kind of NLP model. However if you are restricting users to smaller vocabulary, such as the number in the phrase will always be time or any amount must be followed by a $ sign, then i think you wont need an NLP solution. it depends... There are NLP models available for other languages, and i think NLTK might handle at least all European languages.
Akshay Bhat
+2  A: 

This does not appear to require full NLP. Simple pattern-based information extraction will probably suffice. The basic idea is to tokenize the text, then recognize/classify certain keywords, and finally recognize patterns/phrases.

In your example, tokenizing gives you "Dentist", "appointment", "monday", "at", "2PM", "in", "WhateverPlace". Your tool will recognize that "monday" is a day of the week, "2PM" is a time, etc. Finally, you can find patterns like [at] [TIME] and [in] [Place] and use those to fill in the fields.

A framework like GATE may help, but even that may be a larger hammer than you really need.

Aaron Novstrup