tags:

views:

159

answers:

3

I don't know whether stackoverflow covers NLP, so I am gonna give this a shot. I am interested to find the semantic relatedness of two words from a specific domain, i.e. "image quality" and "noise". I am doing some research to determine if reviews of a cameras are positive or negative for a particular attribute of the camera. (like image quality in each one of the reviews).

However, not everybody uses the exact same wording "image quality" in the posts, so I am out to see if there is a way for me to build something like that:

"image quality" which includes ("noise", "color", "sharpness", etc etc) so I can wrap all everything within one big umbrella.

I am doing this for another language, so Wordnet is not necessarily helpful. And no, I do now work for Google or Microsoft so I do not have data from people's clicking behavior as input data either.

However, I do have a lot of text, pos-tagged, segmented etc.

A: 

Take a look at Latent Semantic Indexing http://en.wikipedia.org/wiki/Latent_semantic_indexing it specifically addresses your problem. However you need to come up with some way to correlate these meta concepts with either positive or negative sentiments. Sentiment analysis http://en.wikipedia.org/wiki/Sentiment_analysis should help you.

Vlad
+1  A: 

Re your comment:

  1. Classifiation through machine learning is being used for NLP all the time.
  2. Regarding semantic similarity between concepts, see Dekang Lin's information theoretic definition of similarity.

Please also see these questions: finding related words,semantic similarity of two phrases.

Yuval F
A: 

You might want to take a look at the book Opinion mining and sentiment analysis. If you are only interested in similarity of words and phrases, this survey paper may help you: From Frequency to Meaning: Vector Space Models of Semantics

ephes