Does anybody know whether one exists?
I've been googling this for monthes...
Thanks
Does anybody know whether one exists?
I've been googling this for monthes...
Thanks
Out of curiosity sparked by your question, I contacted Itamar Syn-Hershko who was active on Lucene mailing lists about a year ago when he was working on a Hebrew analyzer for Lucene. I asked him if he completed his analyzer. Here are some relevant bits from his response:
To make a long story short, no I didn't. There is no decent free / open-source Hebrew analyzer for Lucene, that I can say for sure. I'm not sure what is your background on the subject, but believe me when I say there is no easy way of doing this; it might be also the Lucene isn't built for Hebrew searches, but I do agree a solution has to be given. Granted, the safest way to index and search Hebrew texts is to use a specialized stemmer, and integration with Lucene is not the easiest even after you've done this. There are a few very good solutions for Hebrew search in the market, only one that I know of is using Lucene in it's core; I've recently tried contacting them, no response yet...
The commercial product based on Lucene that is mentioned is called ATTIVIO and the ATTIVIO website does claim to have Hebrew support. At SIGTRS (Hebrew Text Retrieval interest group), there has been some discussion regarding ATTIVIO that claims it is Lucene based.
So, apparently, it is possible to create a decent Hebrew analyzer for Lucene, but there is no free analyzer available at this time.
dtsearch has a hebrew stemming plugin call "pensim". It appears to be developed by "wizcomtech.com".