views:

241

answers:

2

I am trying to add full text search capabilities to my RoR app, but facing some issues when it comes to Arabic. AFAIK, there aren't many search engines out there that support arabic stemming, morphology and other advanced full text search. The only thing I found was Lucene with the AraMorph tokenizer.

The acts_as_solr plugin (solr is based on lucene, and this plugin integrates it with Rails) seems to be abandoned, and I can't find any helpful documentation.

I've looked into sphinx, xapian, ferret, and acts_as_searchable but none of them offers advanced arabic search functionality to the best of my knowledge.

Any help would be really appreciated

== Update
I've got suggestions to use sphinx, and I did use it on an earlier project, and it works just fine. However, it does not provide any advanced search capabilities.
for instance, the words: كتاب (book), مكتبة (library), and كاتب (writer) are all derived from the same stem كتب. I want to have the ability to search for "writer" and get results for all words derived from the same stem.
Also, I want the search to take into account common arabic dictation styles. Some use the "hamza" (همزة) and some people don't. Others write words with the letter "taa marboota" (التاء المربوطة) while others use the letter "haa" (الهاء). A good arabic search engine should realize such subtle differences and look for them.

With sphinx you only get what you search for, and the only engine I found to accommodate such matters in the arabic language, was Lucene with AraMorph tokenizer. However, acts_as_solr (the lucene plugin for rails) is abandoned . So my question is: is there any other such tokenizer for any search engine?
KandadaBoggu mentioned sunspot, I'll give that a go, and respond back

+1  A: 

For Solr use Sunspot and Sunspot Rails.

For Sphinx use Thinking Sphinx

Both gems are excellent and have a large install base. I have used ThinkingSphinx in few projects and I highly recommend it.

KandadaBoggu
I've used Sphinx + ThinkingSphinx with Arabic language. Works fine.
uzzz
Sphinx is great, used it on a project before in arabic and it works. However, it does not provide any advanced full search capabilities. It only searches for whatever I give it, no stemming, or morphology or taking consideration of arabic diacritics.I'll give sunspot a go, and report back
Faisal
solr with sunspot rails seems to be a very solid search engine. However, I did manage to get it working with AraMorph (the arabic stemmer). Thanx for the tip anyway
Faisal
+1  A: 

You should try this by extending Thinking Sphinx options

Read this: http://www.expressionlab.com/2008/11/19/thinking-sphinx-in-arabic-unicode

amrnt
Thanx for the link. I've come across it earlier, and did manage to get sphinx running on an arabic site. Please check the update of my question for a better explanation of my problem.
Faisal
I'll accept this as the answer, because it provides arabic search with character folding. Stemming however is not solved.
Faisal