views:

44

answers:

2

Hi,

I am playing around with lucene and 40GB of data (~500M of tuples, 2 fields behaving like key - value). I have created -- a suprise -- a 35 GB index which does not work. Therefore I want to create a set of smaller indicies but, for that, I need information about maximum size.

A: 

What filesystem do you use? Are you absolutely sure that you have created a valid index? How exactly are you indexing your data?

Theoretically you shouldn't be at the maximum by a long shot.

Limitations

When referring to term numbers, Lucene's current implementation uses a Java int, which means the maximum number of unique terms in any single index segment is 2,147,483,648. This is technically not a limitation of the index file format, just of Lucene's current implementation.

Similarly, Lucene uses a Java int to refer to document numbers, and the index file format uses an Int32 on-disk to store document numbers. This is a limitation of both the index file format and the current implementation. Eventually these should be replaced with either UInt64 values, or better yet, VInt values which have no limit.

http://lucene.apache.org/java/3_0_0/fileformats.html#Limitations

Matthijs Bierman
I am using the index as a map (key, value). The keys are indexed and normalized. The values are stored. My platform is Windows XP on NTFS.
Skarab
A: 

Are you using MMapDirectory and a 32-bit VM? If so, the address space is not enough to cover the whole index and that might have caused the problem. In that case you need to use SimpleFSDirectory or NIOFSDirectory instead. Note that functions like FSDirectory.open(File) return a FSDirectory, which might or might not be a MMapDirectory.

Kai Chan