opening lucene index stored in hdfs | ansaurus

tags:

views:

206

answers:

1

Q:

opening lucene index stored in hdfs

How to read a lucene index directory stored over HDFS i.e. How to get IndexReader for the index stored over HDFS. The IndexReader is to opened in a map task.

Something like: IndexReader reader = IndexReader.open("hdfs/path/to/index/directory");

Thanks, Akhil

A:

If you want to open a Lucene index that's stored in HDFS for the purpose of searching, you're out of luck. AFAIK, there is no implementation of Directory for HDFS that allows for search operations. One reason this is the case is because HDFS is optimized for sequential reads of large blocks, not small, random reads which Lucene incurs.

In the Nutch project, there is an implementation of HDFSDirectory which you can use to create an IndexReader, but only delete operations work. Nutch only uses HDFSDirectory to perform document deduplication.

bajafresh4life 2010-05-04 13:20:02

There is indeed a Directory implementation http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/contrib/index/lucene/FileSystemDirectory.html to open Lucene directory over a general FS System

Akhil 2010-05-07 10:08:14

But unfortunately I am getting this AbstractMethodError http://stackoverflow.com/questions/2763038/java-abstractmethoderror when I use IndexReader.open(new FileSystemDirectory(.....)). in a map task

Akhil 2010-05-07 10:11:41

Didn't realize FileSystemDirectory existed. I would be wary of it though. It doesn't seem like it's actively maintained.

bajafresh4life 2010-05-07 13:16:19

related questions

Lucene.Net Search result to highlight search keywords

Does a pom.xml.template tell me everything I need to know to use the project as a dependency

Can someone compare a Fuzzy Query to a LuceneDictionary solution?

Has anyone used lucene.net with Linq-to-Entities?

Can someone give me a high overview of how lucene.net works?

Using Lucene to count results in categories

Which search technology to use with ASP.NET?

How to do query auto-completion/suggestions in Lucene?

Should an index be optimised after incremental indexes in Lucene?

What is the best search approach using Lucene?

How to best search against a DB with Lucene?

Is there a fast, accurate Highlighter for Lucene?

How to sort by Lucene.Net field and ignore common stop words such as 'a' and 'the'?

How do I estimate the size of a Lucene index?

Analyzer for Russian language in Lucene and Lucene.Net

In Lucene how do terms get used in calculating scores, can I override it with a CustomScoreQuery?

Troubleshoot Java Lucene ignoring Field

Best full text search alternative to ms sql, c++ solution

Strategies for keeping a Lucene Index up to date with domain model changes

How to get facet ranges in solr results?

Using Lucene to search for email addresses

WildcardQuery error in Solr

With Lucene: Why do I get a Too Many Clauses error if I do a prefix search?

Lucene exact ordering

Lucene Score results