ansaurus

Question

Please help me find the official name of this programming approach.

Answer 1

+1 A:

Is the idea of a bit field what you're looking for? Every bit of your ... field (for lack of a better word) represents a flag. In this case, your flag is membership in set N.

Edit - I think I misunderstood which idea you were referring to. Disregard?

Sapph 2010-01-06 00:25:42

Ok, bit field is the value, but what about using the bit field for queries?

Hamish Grubijan 2010-01-06 00:28:47

It's not **exactly** what you're talking about, but you might be interested in reading up on "bloom filters". They use a bit field to check for set membership (but can turn up false positives).

Sapph 2010-01-06 00:34:03

Answer 2

+1 A:

It's very common to use an integer to represent a set of small integers; it's often called a bitset or bitvector. Here you're using an integer to represent "the set of input sequences that contain this value".

The operation you're doing reminds me of reversing a multimap.

In your case, the input is a list of lists:

[["a", "b"], ["a", "c", "d"]]

But you could instead think of it as a bag of ordered pairs, like this:

0, "a"
0, "b"
1, "a"
1, "c"
1, "d"

You're simply constructing a table that contains the reversed pairs

"a", 0
"b", 0
"a", 1
"c", 1
"d", 1

which looks like this:

{"a": [0, 1],
 "b": [0],
 "c": [1],
 "d": [1]}

and you happen to be representing those arrays of integers using bitvectors.

The original data structure (the list of lists) made it easy to iterate over all values for a given list. The reversed data structure (the dictionary of lists) makes it easy to find all lists that have a given value.

Jason Orendorff 2010-01-06 00:27:50

Answer 3

+7 A:

Using an N bit integer to represent N booleans is a special case of the data structure known as a perfect hash table. Notice you're explicitly using dicts (which are general hash tables) in the idea that prompted you to think about bitsets. It's a hash table because you use hashes to find a value, and it's perfect because you never have collisions. The special case is because of how the table is packed and stored.

Formulate the hash function, which shows how it's different from an array:

int bitset_hash(int n) {
  // domain of this function is only non-negative ints
  return 1 << n;
}

Notice bitset_hash(3) is 0b1000, which corresponds to the 4th item (offset/index 3) when using a C int and bitwise operations. (Because of the storage implementation detail, bitwise operations are also used to manipulate a specific item from the hash.)

Extending the approach to use bitwise-and/-or/-xor for set operations is common, and doesn't require any special name other than "set operations" or, if you need a buzzword, "set theory".

Finally, here's another example use of it in a prime sieve (I've used this code on Project Euler solutions):

class Sieve(object):
  def __init__(self, stop):
    self.stop = stop
    self.data = [0] * (stop // 32 // 2 + 1)
    self.len = 1 if stop >= 2 else 0
    for n in xrange(3, stop, 2):
      if self[n]:
        self.len += 1
        for n2 in xrange(n * 3, stop, n * 2):
          self[n2] = False

  def __getitem__(self, idx):
    assert idx >= 2
    if idx % 2 == 0:
      return idx == 2
    int_n, bit_n = divmod(idx // 2, 32)
    return not bool(self.data[int_n] & (1 << bit_n))

  def __setitem__(self, idx, value):
    assert idx >= 2 and idx % 2 != 0
    assert value is False
    int_n, bit_n = divmod(idx // 2, 32)
    self.data[int_n] |= (1 << bit_n)

  def __len__(self):
    return self.len

  def __iter__(self):
    yield 2
    for n in xrange(3, self.stop, 2):
      if self[n]:
        yield n

Roger Pate 2010-01-06 01:55:01

-1 for calling an array a special case of a hash table :)

Jason Orendorff 2010-01-06 13:56:33

Jason Orendorff 2010-01-06 14:13:59

I did call it a special case due to exactly how it's stored. In a non-perfect hash table, which is most common, I wouldn't call the collision resolution a "flat array", even though an array is commonly used, as there's much more going on. To me that's an implementation detail anyway, and it's the fast lookup of values from hash(key) that's the important part of the definition. -- Though I just realized that if you "take my definitions", you get them all and there's thus no problem. :)

Roger Pate 2010-01-06 17:43:50

I didn't say the collision resolution was a flat array. That makes no sense.

Jason Orendorff 2010-01-06 18:00:29

*"it's the fast lookup of values from hash(key) that's the important part of the definition"* So your definition of a hash table is "any data structure with fast lookup based on hash(key)". I think this is too loose a definition because, as you've demonstrated, you can point to any intermediate result of the lookup algorithm and call it a hash code. Under your definition, a simple array is a hash table with hash(idx) = (addr + idx * size).

Jason Orendorff 2010-01-06 18:07:22

More precisely, you were completely ignoring collisions. The popular array-of-linked-lists implementation, for example, *is not a flat array*. And no, if you want to stretch things that way, a simple array would be a hash table with an identity hash function---which is only a hash function in the most basic sense. Please don't put words in my mouth, as I never said emphasized *any data structure* like that, just that fast lookup from keys is *more* important than implementation issues.

Roger Pate 2010-01-06 18:32:27

An array is a special case of a dictionary (or, alternatively, a dictionary is a generalization of array). Dictionaries are often implemented using hash tables. Therefore, it's not a huge stretch to call an array a special case of hash table.

Wayne Conrad 2010-01-06 19:10:22

Roger: I didn't mean to put words in your mouth, just understand what you were saying. Cheers!

Jason Orendorff 2010-01-06 21:48:36

Answer 4

+4 A:

Yes, it is sometimes used in databases, for example PostgreSQL. As mentions Wikipedia:

Some database systems that do not offer persistent bitmap indexes use bitmaps internally to speed up query processing. For example, PostgreSQL versions 8.1 and later implement a "bitmap index scan" optimization to speed up arbitrarily complex logical operations between available indexes on a single table.

(from http://en.wikipedia.org/wiki/Bitmap_index)

Antoine P. 2010-01-06 02:03:27

ansaurus

tags:

views:

answers:

Please help me find the official name of this programming approach.

related questions