tags:

views:

36

answers:

1

Hi, I need to access a lucene index ( created by crawling several webpages using Nutch) but it is giving the error shown above :

java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/home/<path>: files:
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:516)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:185)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:148)
    at DictionaryGenerator.generateDict(DictionaryGenerator.java:24)
    at DictionaryGenerator.main(DictionaryGenerator.java:56)

I googled but the reasons given were not matching the requirements. The fact that files are being shown ( the path) probably means that the directory is not empty.
Thanks

+2  A: 

Basically, the error message says that Lucene did not find the proper files in the index directory. I suggest checking the following:

  1. Verify the path of the index directory fits what you think it should be.
  2. Do the Nutch and Lucene versions used match? This may stem from a version difference.
  3. Is there a permissions issue? Can you read the files in the directory?
  4. Try looking at the index using Luke. If you cannot, there is probably some corruption in the index.

If all these do not help, Please post the indexing part of the code.

Yuval F
I did all of them except the Nutch and Lucene versions.I was not aware that there has to be a compatibility between Lucene and Nutch . If it helps, the lucene version is 2.2 . I can access the files. Infact,i am running the java program in the same directory as the index . Also, i checked the index using Luke and its definitely fine . Also, the thing is that i just became a part of the project. The index is the result of an extensive crawl by Nutch . So , i do not have any indexing code. It was just a crawl .But i will still try to find out the exact picture.
crazyaboutliv
One thing i have observed is that the newer version of Nutch (1.1) generates 5 folders after a crawl while the data which i have has only 4( out of which segments is one) folders . Can that be an issue ?
crazyaboutliv
Like Yuval said, make sure that the Java program that you use to read the index uses the same version of Lucene that Nutch used to create the index.
Pascal Dimassimo