views:

892

answers:

10

I assume a natural language processor would need to be used to parse the text itself, but what suggestions do you have for an algorithm to detect a user's mood based on text that they have written? I doubt it would be very accurate, but I'm still interested nonetheless.

EDIT: I am by no means an expert on linguistics or natural language processing, so I apologize if this question is too general or stupid.

+2  A: 

I can't believe I'm taking this seriously... assuming a one-dimensional mood space:

  • If the text contains a curse word, -10 mood.
  • I think exclamations would tend to be negative, so -2 mood.
  • When I get frustrated, I type in Very. Short. Sentences. -5 mood.

The more I think about this, the more it's clear that a lot of these signifiers indicate extreme mood in general, but it's not always clear what kind of mood.

Michael Petrotta
I curse when I'm happy :)
Brian Gianforcaro
If male, ! is -2. If female, ! does nothing since girls tend to use ! as .
rpflo
How do you recognize if it was written by a female, then?
J S
@J S: Easy: you just ask another question on SO: "Is it possible to guess a user's gender based on the structure of text?"
luvieere
+1  A: 

Yes.

Whether or not you can do it is another story. The problem seems at first to be AI complete.

Now then, if you had keystroke timings you should be able to figure it out.

Joshua
Keystroke timings? How, exactly?
Michael Petrotta
You'll probably have to calibrate, but you should be able to infer mood from variations in keystroke timings within a few sentences. In particular, anger tends to vary between two extremes.
Joshua
I have no idea what the "AI complete" bit is about, but the rest more or less covers my thoughts on the matter
BCS
+10  A: 

No doubt it is possible to judge a user's mood based on the text they type but it would be no trivial thing. Things that I can think of:

  • Capitals tends to signify agitation, annoyance or frustration and is certainly an emotional response but then again some newbies do that because they don't realize the significance so you couldn't assume that without looking at what else they've written (to make sure its not all in caps);
  • Capitals are really just one form of emphasis. Others are use of certain aggressive colours (eg red) or use of bold or larger fonts;
  • Some people make more spelling and grammar mistakes and typos when they're highly emotional;
  • Scanning for emoticons could give you a very clear picture of what the user is feeling but again something like :) could be interpreted as happy, "I told you so" or even have a sarcastic meaning;
  • Use of expletives tends to have a clear meaning but again its not clearcut. Colloquial speech by many people will routinely contain certain four letter words. For some other people, they might not even say "hell", saying "heck" instead so any expletive (even "sucks") is significant;
  • Groups of punctuation marks (like @#$@$@) tend to be replaced for expletives in a context when expletives aren't necessarily appropriate, so thats less likely to be colloquial;
  • Exclamation marks can indicate surprise, shock or exasperation.

You might want to look at Advances in written text analysis or even Determining Mood for a Blog by Combining Multiple Sources of Evidence.

Lastly it's worth noting that written text is usually perceived to be more negative than it actually is. This is a common problem with email communication in companies, just as one example.

cletus
+1  A: 

My memory isn't good on this subject, but I believe I saw some research about the grammar structure of the text and the overall tone. That could be also as simple as shorter words and emotion expression words (well, expletives are pretty obvious).

Edit: I noted that the first person to answer had substantially similar post. There could be indeed some serious idea about shorter sentences.

ilya n.
+1  A: 

Analysis of mood and behavior is very serious science. Despite the other answers mocking the question law enforcement agencies have been investigating categorization of mood for years. Uses in computers I have heard of generally had more context (timing information, voice pattern, speed in changing channels). I think that you could--with some success--determine if a user is in a particular mood by training a Neural Network with samples from two known groups: angry and not angry. Good luck with your efforts.

ojblass
+1  A: 

I agree with ojblass that this is a serious question.

Mood categorization is currently a hot topic in the speech recognition area. If you think about it, an interactive voice response (IVR) application needs to handle angry customers far differently than calm ones: angry people should be routed quickly to human operators with the right experience and training. Vocal tone is a pretty reliable indicator of emotion, practical enough so that companies are eager to get this to work. Google "speech emotion recognition", or read this article to find out more.

The situation should be no different in web-based GUIs. Referring back to cletus's comments, the analogies between text and speech emotion detection are interesting. If a person types CAPITALS they are said to be 'shouting', just as if his voice rose in volume and pitch using a voice interface. Detecting typed profanities is analogous to "keyword spotting" of profanity in speech systems. If a person is upset, they'll make more errors using either a GUI or a voice user interface (VUI) and can be routed to a human.

There's a "multimodal" emotion detection research area here. Imagine a web interface that you can also speak to (along the lines of the IBM/Motorola/Opera XHTML + Voice Profile prototype implementation). Emotion detection could be based on a combination of cues from the speech and visual input modality.

Jim Ferrans
+1  A: 

I think, my algorythm is rather straightforward, yet, why not calculating smilics through the text :) vs :(

Obviously, the text ":) :) :) :)" resolves to a happy user, while ":( :( :(" will surely resolve to a sad one. Enjoy!

+17  A: 

This is the basis of an area of natural language processing called sentiment analysis. Although your question is general, it's certainly not stupid - this sort of research is done by Amazon on the text in product reviews for example.

If you are serious about this, then a simple version could be achieved by -

  1. Acquire a corpus of positive/negative sentiment. If this was a professional project you may take some time and manually annotate a corpus yourself, but if you were in a hurry or just wanted to experiment this at first then I'd suggest looking at the sentiment polarity corpus from Bo Pang and Lillian Lee's research. The issue with using that corpus is it is not tailored to your domain (specifically, the corpus uses movie reviews), but it should still be applicable.

  2. Split your dataset into sentences either Positive or Negative. For the sentiment polarity corpus you could split each review into it's composite sentences and then apply the overall sentiment polarity tag (positive or negative) to all of those sentences. Split this corpus into two parts - 90% should be for training, 10% should be for test. If you're using Weka then it can handle the splitting of the corpus for you.

  3. Apply a machine learning algorithm (such as SVM, Naive Bayes, Maximum Entropy) to the training corpus at a word level. This model is called a bag of words model, which is just representing the sentence as the words that it's composed of. This is the same model which many spam filters run on. For a nice introduction to machine learning algorithms there is an application called Weka that implements a range of these algorithms and gives you a GUI to play with them. You can then test the performance of the machine learned model from the errors made when attempting to classify your test corpus with this model.

  4. Apply this machine learning algorithm to your user posts. For each user post, separate the post into sentences and then classify them using your machine learned model.

So yes, if you are serious about this then it is achievable - even without past experience in computational linguistics. It would be a fair amount of work, but even with word based models good results can be achieved.

If you need more help feel free to contact me - I'm always happy to help others interested in NLP =]


Small Notes -

  1. Merely splitting a segment of text into sentences is a field of NLP - called sentence boundary detection. There are a number of tools, OSS or free, available to do this, but for your task a simple split on whitespaces and punctuation should be fine.
  2. SVMlight is also another machine learner to consider, and in fact their inductive SVM does a similar task to what we're looking at - trying to classify which Reuter articles are about "corporate acquisitions" with 1000 positive and 1000 negative examples.
  3. Turning the sentences into features to classify over may take some work. In this model each word is a feature - this requires tokenizing the sentence, which means separating words and punctuation from each other. Another tip is to lowercase all the separate word tokens so that "I HATE you" and "I hate YOU" both end up being considered the same. With more data you could try and also include whether capitalization helps in classifying whether someone is angry, but I believe words should be sufficient at least for an initial effort.


Edit

I just discovered LingPipe that in fact has a tutorial on sentiment analysis using the Bo Pang and Lillian Lee Sentiment Polarity corpus I was talking about. If you use Java that may be an excellent tool to use, and even if not it goes through all of the steps I discussed above.

Smerity
+1  A: 

If you support fonts, bold red text is probably an angry user. Green regular sized texts with butterfly clip art a happy one.

Alex
A: 

Fuzzy logic will do I guess. Any way it will be quite easy to start with several rules of determining the user's mood and then extend and combine the "engine" with more accurate and sophisticated ones.

bv