What is index hashing ? What are its advantages over regular hashing techniques ?
Hello
Index Hashing
Searchable content is mapped to the search engine using Compass different mapping definitions (OSEM/XSEM/RSEM). Compass provides the ability to partition the searchable content into different sub indexes, as shown in the next diagram:
Sub Index Hashing
http://www.opensymphony.com/compass/versions/1.1M1/html/images/subindex-hash.png
In the above diagram A, B, C, and D represent aliases which in turn stands for the mapping definitions of the searchable content. A1, B2, and so on, are actual instances of the mentioned searchable content. The diagram shows the different options of mapping searchable content into different sub indexes. Constant Sub Index Hashing
The simplest way to map aliases (stands for the mapping definitions of a searchable content) is by mapping all its searchable content instances into the same sub index. Defining how searchable content mapping to the search engine (OSEM/XSEM/RSEM) is done within the respectable mapping definitions. There are two ways to define a constant mapping to a sub index, the first one (which is simpler) is:
<compass-core-mapping>
<[mapping] alias="test-alias" sub-index="test-subindex">
<!-- ... -->
</[mapping]>
</compass-core-mapping>
The mentioned [mapping] that is represented by the alias test-alias will map all its instances to test-subindex. Note, if sub-index is not defined, it will default to the alias value.
Another option, which probably will not be used to define constant sub index hashing, but shown here for completeness, is by specifying the constant implementation of SubIndexHash within the mapping definition (explained in details later in this section):
<compass-core-mapping>
<[mapping] alias="test-alias">
<sub-index-hash type="org.compass.core.engine.subindex.ConstantSubIndexHash">
<setting name="subIndex" value="test-subindex" />
</sub-index-hash>
<!-- ... -->
</[mapping]>
</compass-core-mapping>
Modulo Sub Index Hashing
Constant sub index hashing allows to map an alias (and all its searchable instances it represents) into the same sub index. The modulo sub index hashing allows for partitioning an alias into several sub indexes. The partitioning is done by hashing the alias value with all the string values of the searchable content ids, and then using the modulo operation against a specified size. It also allows setting a constant prefix for the generated sub index value. This is shown in the following diagram:
Modulo Sub Index Hashing
Here, A1, A2 and A3 represent different instances of alias A (let it be a mapped Java class in OSEM, a Resource in RSEM, or an XmlObject in XSEM), with a single id mapping with the value of 1, 2, and 3. A modulo hashing is configured with a prefix of test, and a size of 2. This resulted in the creation of 2 sub indexes, called test_0 and test_1. Based on the hashing function (the alias String hash code and the different ids string hash code), instances of A will be directed to their respective sub index. Here is how A alias would be configured:
<[mapping] alias="A">
Naturally, more than one mapping definition can map to the same sub indexes using the same modulo configuration:
Complex Modulo Sub Index Hashing
Custom Sub Index Hashing
ConstantSubIndexHash and ModuloSubIndexHash are implementation of Compass SubIndexHash interface that comes built in with Compass. Naturally, a custom implementation of the SubIndexHash interface can be configured in the mapping definition.
An implementation of SubIndexHash must provide two operations. The first, getSubIndexes, must return all the possible sub indexes the sub index hash implementation can produce. The second, mapSubIndex(String alias, Property[] ids) uses the provided aliases and ids in order to compute the given sub index. If the sub index hash implementation also implements the CompassConfigurable interface, different settings can be injected to it. Here is an example of a mapping definition with custom sub index hash implementation:
<compass-core-mapping>
<[mapping] alias="A">
<sub-index-hash type="eg.MySubIndexHash">
<setting name="param1" value="value1" />
<setting name="param2" value="value2" />
</sub-index-hash>
<!-- ... -->
</[mapping]>
</compass-core-mapping>
Source :http://www.opensymphony.com/compass/versions/1.1M1/html/core-searchengine.html