views:

203

answers:

2

Hi,

I need to analyze a users' post and categorize it. For example: I have to categorize every post as a "buy" post or a "sell" post based on the text - "I'm looking to sell my house" is categorized as "sell". The problem is that often its not so simple - "I'm looking to get rid of my old house" also needs to be categorized as "sell". "I'm looking for a house" becomes "buy". I also would like to categorize these posts based on the item in question - for example, the post above would be categorized as "buy" and as "house".

Can anyone recommend a good approach / good framework / technique when it comes to analyzing and understanding user input? Thanks.

+3  A: 

What you're talking about is basically a Bayesian filtering problem, also used for spam filtering. See also this talk. It's a reasonably complicated area.

cletus
+2  A: 

You're right; it's a hard thing to do.

Yahoo! has a Term Extraction API/Web service you can use. It's a pretty good way to use language analysis on your own text without writing a million lines of code to do it yourself. I haven't used it, so I've no idea how well it works with similar meanings, as your question asks.

Jeremy Smyth