views:

206

answers:

1

How to read a lucene index directory stored over HDFS i.e. How to get IndexReader for the index stored over HDFS. The IndexReader is to opened in a map task.

Something like: IndexReader reader = IndexReader.open("hdfs/path/to/index/directory");

Thanks, Akhil

A: 

If you want to open a Lucene index that's stored in HDFS for the purpose of searching, you're out of luck. AFAIK, there is no implementation of Directory for HDFS that allows for search operations. One reason this is the case is because HDFS is optimized for sequential reads of large blocks, not small, random reads which Lucene incurs.

In the Nutch project, there is an implementation of HDFSDirectory which you can use to create an IndexReader, but only delete operations work. Nutch only uses HDFSDirectory to perform document deduplication.

bajafresh4life
There is indeed a Directory implementation http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/contrib/index/lucene/FileSystemDirectory.html to open Lucene directory over a general FS System
Akhil
But unfortunately I am getting this AbstractMethodError http://stackoverflow.com/questions/2763038/java-abstractmethoderror when I use IndexReader.open(new FileSystemDirectory(.....)). in a map task
Akhil
Didn't realize FileSystemDirectory existed. I would be wary of it though. It doesn't seem like it's actively maintained.
bajafresh4life