views:

228

answers:

1

hey guys, my requirements are pretty similar to this:

Requirements

http://stackoverflow.com/questions/90580/word-frequency-algorithm-for-natural-language-processing

Using Solr

While the answer for that question is excellent, I was wondering if I could make use of all the time I spent getting to know SOLR for my NLP.

I thought of SOLR because:

  1. It's got a bunch of tokenizers and performs a lot of NLP.
  2. It's pretty use to use out of the box.
  3. It's restful distributed app, so it's easy to hook up
  4. I've spent some time with it, so using could save me time.

Can I use Solr?

Although the above reasons are good, I don't know SOLR THAT well, so I need to know if it would be appropriate for my requirements.

Ideal Usage

Ideally, I'd like to configure SOLR, and then be able to send SOLR some text, and retrieve the indexed tonkenized content.

Context

So you guys know, I'm working on a small component of a bigger recommendation engine.

+2  A: 

I guess you can use Solr and combine it with other tools. Tokenization, stop word removal, stemming, and even synonyms come out of the box with Solr. If you need named entity recognition or base noun-phrase extraction, you need to use OpenNLP or an equivalent tool as a pre-processing stage. You will probably need term vectors for your retrieval purposes. Integrating Apache Mahout with Apache Lucene and Solr may be useful as it discusses Lucene and Solr integration with a machine learning (including recommendation) engine. Other then that, feel free to ask further more specific questions.

Yuval F
thanks yuval, great answer. I've had a chat with my superiors and for whatever reason we can't have Java running on the server. So, for my purposes, using the underlying lucene library (the .NET Port) would server me just as well right? cheers!
andy
Thanks Andy. Sure you can use Lucene.Net. The core functionality is there. It is always a little behind the Java version, though, but I guess you can do a lot with it. As to Lucene versus Solr, using bare Lucene requires handling a lot more integration and configuration details. Please see this question: http://stackoverflow.com/questions/1749314/is-solr-available-for-net and this blog post: http://www.lucidimagination.com/blog/2010/05/26/migrating-from-lucene-to-solr/
Yuval F