views:

1531

answers:

4

I'm looking to do some sentence analysis (mostly for twitter apps) and infer some general characteristics. Are there any good natural language processing libraries for this sort of thing in Ruby?

Similar to http://stackoverflow.com/questions/870460/java-is-there-a-good-natural-language-processing-library but for Ruby. I'd prefer something very general, but any leads are appreciated!

+6  A: 

There are some things at Ruby Linguistics and some links therefrom, though it doesn't seem anywhere close to what NLTK is for Python, yet.

Alex Martelli
+1  A: 

You need to be much more specific about what these "general characteristics" are.

In NLP "general characteristics" of a sentence can mean a million different things - sentiment analysis (ie, the attitude of the speaker), basic part of speech tagging, use of personal pronoun, does the sentence contain active or passive verbs, what's the tense and voice of the verbs...

I don't mind if you're vague about describing it, but if we don't know what you're asking it's highly unlikely we can be specific in helping you.

My general suggestion, especially for NLP, is you should get the tool best designed for the job instead of limiting yourself to a specific language. Limiting yourself to a specific language is fine for some tasks where the general tools are implemented everywhere, but NLP is not one of those.

The other issue in working with Twitter is a great deal of the sentences there will be half baked or compressed in strange and wonderful ways - which most NLP tools aren't trained for. To help there, the NUS SMS Corpus consists of "about 10,000 SMS messages collected by students". Due to the similar restrictions and usage, analysing that may be helpful in your explorations with Twitter.

If you're more specific I'll try and list some tools that will help.

Smerity
+4  A: 

You can always use jruby and use the java libraries.

EDIT: Why was this downvoted? The ability to do ruby natively on the jvm and easily leverage java libraries is a big plus for rubyists. This is a good option that should be considered in a situation like this.

jshen
+2  A: 

I found an excellent article detailing some NLP algorithms in Ruby here. This includes stemmers, date time parsers and grammar parsers.

Joey Robert