views:

511

answers:

2

I've been looking into faceted search with Lucene.NET, I've found a brilliant example here which explains a fair amount, apart from the fact that it completely overlooks the function which checks the cardinality of items in a bit array.

Can anyone give me a run down of what it is doing? The main things I don't understand is why the bitsSetArray is created as it is, what it is used for and how all the if statements work in the for loop.

This may be a big ask but I have to understand how this works before I can even think of using it in my own code.

Thanks

public static int GetCardinality(BitArray bitArray)
 {
  var _bitsSetArray256 = new byte[] {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8};
  var array = (uint[])bitArray.GetType().GetField("m_array", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(bitArray);
  int count = 0;

  for (int index = 0; index < array.Length; index ++)
   count += _bitsSetArray256[array[index] & 0xFF] + _bitsSetArray256[(array[index] >> 8) & 0xFF] + _bitsSetArray256[(array[index] >> 16) & 0xFF] + _bitsSetArray256[(array[index] >> 24) & 0xFF];

  return count;
 }
+2  A: 

The _bitsSetArray256 array is initialised with values such that _bitsSetArray256[n] contains the number of bits set in the binary representation of n, for n in 0..255.

For example, _bitsSetArray256[13] equals 3, because 13 in binary is 1101 which contains 3 1s.

The reason for doing this is that it's far faster to pre-compute these values and store them, rather than having to work them out each time (or on-demand). It's not like the number of 1s in the binary representation of 13 is ever going to change, after all :)

Within the for loop, we are looping through an array of uints. A C# uint is a 32-bit quantity, ie made up for 4 bytes. Our lookup table tells us how many bits are set in a byte, so we must process each of the four bytes. The bit manipulation in the count += line extracts each of the four bytes, then gets its bit count from the lookup array. Adding together the bit counts for all four bytes gives the bit count for the uint as a whole.

So given a BitArray, this function digs into the uint[] m_array member, then returns the total number of bits set in the binary representation of the uints therein.

AakashM
Brilliant, thanks AakashM. Some of this still goes over my head but at least I understand the concept of the method and exactly what it is doing.
John_
+3  A: 

I just wanted to post a helpful article about bitArrays for those of us who are developing our own versions of Faceting with Lucene.net. See: http://dotnetperls.com/precomputed-bitcount

This is a good explination on the fastet way to get the cardinality of the on bits in an integer ( which is a bulk of what the above code sample does ).

Imlementing the method in the article in my faceted search and some other simple changes i was able to cut the time it took the get the count by ~ 65%. The differences where in:

  1. Declaring the _bitcount global ( so its not created per call )
  2. Changing the for to foreach (ANT Profiler showed a 25% gain here)
  3. Implementening the 65535 table vs the 256 to shift 16 bits at a time rather then 8.

    private static int[] _bitcounts = InitializeBitcounts();
    

    private static int GetCardinality(BitArray bitArray) { uint[] array = (uint[])bitArray.GetType().GetField("m_array", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(bitArray);

        int count = 0;
        foreach (uint value in array)
        {
            count += _bitcounts[value & 65535] + _bitcounts[(value >> 16) & 65535];           
        }
        return count;
    }
    
    
    private static int[] InitializeBitcounts()
    {
        int[] bitcounts = new int[65536];
        int position1 = -1;
        int position2 = -1;
        //
        // Loop through all the elements and assign them.
        //
        for (int i = 1; i < 65536; i++, position1++)
        {
            //
            // Adjust the positions we read from.
            //
            if (position1 == position2)
            {
                position1 = 0;
                position2 = i;
            }
            bitcounts[i] = bitcounts[position1] + 1;
        }
        return bitcounts;
    }
    
John Morrison