views:

1450

answers:

5

I need your help in determining the best approach for analyzing industry-specific sentences (i.e. movie reviews) for "positive" vs "negative". I've seen libraries such as OpenNLP before, but it's too low-level - it just gives me the basic sentence composition; what I need is a higher-level structure: - hopefully with wordlists - hopefully trainable on my set of data

Thanks!

A: 

Good luck with that. I can't see how even with word lists you'll be able to deal with wit and sarcasm, and of course movie reviews love to use such things.

  • "I laughed so hard my sides hurt".
  • "It hurt to watch, I'm laughing at the director right now."

But if you do solve this problem I suggest you go on to write a program that can tell if jokes are funny or not.

Jeff
I'm sorry, that's not useful. I'm interested in the general case, some approach that can give a 80% confidence. Obviously, computers can't understand natural language perfectly.
Alex
+1  A: 

try stemming with WordNet; you may have to augment the vocabulary with positive/negative weights to get what you want though

Steven A. Lowe
+9  A: 

What you are looking for is commonly dubbed Sentiment Analysis. Typically, sentiment analysis is not able to handle delicate subtleties, like sarcasm or irony, but it fares pretty well if you throw a large set of data at it.

Sentiment analysis usually needs quite a bit of pre-processing. At least tokenization, sentence boundary detection and part-of-speech tagging. Sometimes, syntactic parsing can be important. Doing it properly is an entire branch of research in computational linguistics, and I wouldn't advise you with coming up with your own solution unless you take your time to study the field first.

OpenNLP has some tools to aid sentiment analysis, but if you want something more serious, you should look into the LingPipe toolkit. It has some built-in SA-functionality and a nice tutorial. And you can train it on your own set of data, but don't think that it is entirely trivial :-).

Googling for the term will probably also give you some resources to work with. If you have any more specific question, just ask, I'm watching the nlp-tag closely ;-)

Aleksandar Dimitrov
Amazingly useful - thanks a bunch, Aleksandar!
Alex
+1  A: 

A Naïve Bayesian Classifier would do the job, and it's very simple to implement.

Osama ALASSIRY
+1  A: 

Alex,

Some approaches to sentiment analysis use strategies popular on other text classification tasks. The most common being transforming your film review into a word vector, and feeding it into a classifier algorithm as training data. Most popular data mining packages can help you here. You could have a look at this tutorial on sentiment classification illustrating how to do an experiment using the open source RapidMiner toolkit.

Incidentally, there is a good data set made available for research pusposes related to detecting opinion on film reviews. It is based on IMDB user reviews, and you can check many related research work on the area and how they use the data set.

Its worth bearing in mind that the effectiveness of these methods can only be judged from a statistical viewpoint, so you can pretty much assume there will be misclassifications and cases where opinion is hard to detect. As already noticed in this thread, detecting things like irony and sarcasm can be very difficult indeed.

B.

bohana