views:

35

answers:

2

I want to know what is the VInt in Lucene ?

I read this article , but i don't understand what is it and where does Lucene use it ? Why Lucene doesn't use simple integer or big integer ?

Thanks .

+1  A: 

VInt refers to Lucene's variable-width integer encoding scheme. It encodes integers in one or more bytes, using only the low seven bits of each byte. The high bit is set to zero for all bytes except the last, which is how the length is encoded.

Marcelo Cantos
I know this , but i want to know why lucene does this work ?Why it doesn't use simple integer(0 - ~4,000,000,000) in 4 byte ?
Mehdi Amrollahi
+1  A: 

VInt is extremely space efficient. It could theoretically save upto 75% space.

In Lucene, many of the structures are list of integers. For example, list of documents for a given term, positions (and offsets) of the terms in documents, among others. These lists form bulk of the lucene data.

Think of Lucene indices for millions of documents that need tens of GBs of space. Shrinking space by more than half reduces disk space requirements. While savings of disk space may not be a big win, given that disk space is cheap, the real gain comes reduced disk IO. Disk IO for reading VInt data is lower than reading integers which automatically translates to better performance.

Shashikant Kore