views:

84

answers:

3

Hi,

I've been reading alot of articles that explain the need for an initial set of texts that are classified as either 'positive' or 'negative' before a sentiment analysis system will really work.

My question is: Has anyone attempted just doing a rudimentary check of 'positive' adjectives vs 'negative' adjectives, taking into account any simple negators to avoid classing 'not happy' as positive? If so, are there any articles that discuss just why this strategy isn't realistic?

Thanks in advance, Dave --Trindaz on Fedang #sentiment-analysis

+1  A: 

I haven't tried doing untrained sentiment analysis such as you are describing, but off the top of my head I'd say you're oversimplifying the problem. Simply analyzing adjectives is not enough to get a good grasp of the sentiment of a text; for example, consider the word 'stupid.' Alone, you would classify that as negative, but if a product review were to have '... [x] product makes their competitors look stupid for not thinking of this feature first...' then the sentiment in there would definitely be positive. The greater context in which words appear definitely matters in something like this. This is why an untrained bag-of-words approach alone (let alone an even more limited bag-of-adjectives) is not enough to tackle this problem adequately.

The pre-classified data ('training data') helps in that the problem shifts from trying to determine whether a text is of positive or negative sentiment from scratch, to trying to determine if the text is more similar to positive texts or negative texts, and classify it that way. The other big point is that textual analyses such as sentiment analysis are often affected greatly by the differences of the characteristics of texts depending on domain. This is why having a good set of data to train on (that is, accurate data from within the domain in which you are working, and is hopefully representative of the texts you are going to have to classify) is as important as building a good system to classify with.

Not exactly an article, but hope that helps.

waffle paradox
Thanks for your response waffle! I appreciate all the input I can get on this topic.
Trindaz
A: 

I tried spotting keywords using a dictionary of affect to predict the sentiment label at sentence level. Given the generality of the vocabulary (non domain dependent), the results were just about 61%. The paper is available in my homepage.

In a somewhat improved version, negation adverbs were considered. The whole system, named EmoLib, is available for demo:

http://dtminredis.housing.salle.url.edu:8080/EmoLib/

Regards,

atrilla
Thanks for this atrilla. It ran pretty well for the testing I did.
Trindaz
+5  A: 

A classic paper by Peter Turney (2002) explains a method to do unsupervised sentiment analysis (positive/negative classification) using only the words excellent and poor as a seed set. Turney uses the mutual information of other words with these two adjectives to achieve an accuracy of 74%.

larsmans
This one gets the answer tag. It's a very interesting article.
Trindaz