views:

192

answers:

4

I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained:

Maybe if you tell me a little more about who Mr Jones is, that would help. It would also be useful if I could have a description of his appearance, or even better a photograph?

It'd be great to be able to detect that the person has asked for a photograph of Mr Jones. I could take a really naïve approach and just look for the word "photo" or "photograph", but this would obviously be no good if they wrote something like:

Please, never send me a photo of Mr Jones.

Does anyone know where to start with this? Is it even possible?

I've looked into things like nltk, but I've yet to find an example of someone doing something similar and am still not entirely sure what this kind of analysis is called. Any help that can get me off the ground would be great.

Thanks!

+1  A: 

NLTK is not a bad framework for parsing natural language but beware that this is not a simple matter. Doing stuff like this is really research level programming.

A good thing that makes it much easier is if you have a very limited domain - say your application focuses on information about famous writers, then you can avoid some complexities of natural language like certain types of ambiguities.

Where to start? Good question. I don't know of any tutorials on the topic (and I presume you tried the Google option) but I'd imagine that iTunes U would have a course on the topic. If not I can post a link to a course I've done that mentions the subject and wasn't completely horrible: http://www.inf.ed.ac.uk/teaching/courses/inf2a/lecturematerials/index.html#lecture01

Jakub Hampl
Hi Jakub, thanks for the quick reply.I tried Google, but I didn't really know what I was looking for. I've seen terms like Semantic Interpretation and Garden Path sentences but it doesn't seem to quite fit any of these, which makes Googling tough.The domain would be really limited. I'd want to ask about 5 or 6 pre-defined questions along the lines of the one given above. Does this make life easier?iTunes U is an excellent idea; I will definitely take a look. If you could post a link to your course too, that would be a fantastic help.
Nick
The course I attended will let you understand many of the principles like semantics and garden path sentences etc. I'd search iTunes you for more concrete info.
Jakub Hampl
Excellent, thanks for the link. Think I'm in for a lot of reading...
Nick
+1  A: 

The best thing out there that might be useful to you is automatic sentiment analysis. This is used, for example, to judge whether, say, a customer review is positive or negative. I cannot give you direct pointers to available tools, but this is what you are looking for.

I must say, though, that this is a current hot topic in natural language processing and I’ve seen a number of papers at conferences. It’s definitely quite a complex matter and if you’re starting from scratch, it might take quite some time before you get the results that you want.

Ventzi Zhechev
+1  A: 

The problem that u tackle is very challenging.

I would start by first identifying the entities in the text (problem referred as Named Entity Recognition, google it), and then a I would try to identify concepts.

If want to roughly identify what is the text about, I suggest that you start by using WordNet and according to the words and their places in the hierarchy to identify the concepts involved. If you want to produce a system which show real intelligence than you should start researching about resources such as CYC (OpenCYC) which will allow you to convert the sentences into FOL sentences.

This hardcore AI, approach to solving your problem. For simple chat bot, it would be easier to rely on simple statistical methods.

good luck

A: 

Posted something here by mistake

smitten