views:

157

answers:

3

Is it possible to modify Lucene 2.2 to add Arabic analyzer and if anyone have done this already where can I get source/jar

+1  A: 

Lucene 3.0.1 has Arabic Analyzer. It is in the contrib package.

You can upgrade to Lucene 3.0.1 to get this working out of the box. You probably will not be able to use this as it is for Lucene 2.2 since TokenStream APIs have changed in this release. But, back-porting changes to 2.2 shouldn't be very difficult, in case you don't wish to migrate to latest Lucene release.

Shashikant Kore
the reason I thought of just adding Arabic analyzer to Lucene 2.2 and not upgrading to latest release is that I have to replace all the deprecated methods as it throws RuntimeException, but in the end I guess I'll stick with the migration at latest release for maintainability reasons as I don't want to build my own jar every time a new feature in Lucene released
Mustafa Zidan
A: 

Alternatively, you can try using lucene-hunspell for an analyzer. This is currently working with the Lucene trunk - I do not know whether it works with Lucene 3.0.1. Here is Robert Muir's explanation and a list of dictionaries, including Arabic. I believe you could also back-port this. Shashikant's suggestion seems easier to implement, while this one may be better quality.

Yuval F
A: 

Hello, someone asked me before how to get arabic and persian support on lucene 2.4

so these were unofficially backported here: http://people.apache.org/~rmuir/

http://people.apache.org/~rmuir/lucene-analyzers-2.4.1_with_arabic_and_farsi.jar http://people.apache.org/~rmuir/arabicFarsiLucene241_contrib.patch http://people.apache.org/~rmuir/arabicFarsiLucene241_core.patch

this would mean you only have to upgrade to 2.4.1, which might be easier than upgrading to 2.9 or 3.0.

hope this helps

Robert Muir