views:

963

answers:

7

There's a common way to store multiple values in one variable, by using a bitmask. For example, if a user has read, write and execute privileges on an item, that can be converted to a single number by saying read = 4 (2^2), write = 2 (2^1), execute = 1 (2^0) and then add them together to get 7.

I use this technique in several web applications, where I'd usually store the variable into a field and give it a type of MEDIUMINT or whatever, depending on the number of different values.

What I'm interested in, is whether or not there is a practical limit to the number of values you can store like this? For example, if the number was over 64, you couldn't use (64 bit) integers any more. If this was the case, what would you use? How would it affect your program logic (ie: could you still use bitwise comparisons)?

I know that once you start getting really large sets of values, a different method would be the optimal solution, but I'm interested in the boundaries of this method.

+1  A: 

I've used bit masks in filesystem code where the bit mask is many times bigger than a machine word. think of it like an "array of booleans";

(journalling masks in flash memory if you want to know)

many compilers know how to do this for you. Adda bit of OO code to have types that operate senibly and then your code starts looking like it's intent, not some bit-banging.

My 2 cents.

Tim Williscroft
so are you suggesting to perhaps store it in the database as a binary field of variable length (BLOB?), and then when processing it, convert to an array of bools? that could work - what's the data type you should use in the DB?
nickf
+1  A: 

With a 64-bit integer, you can store values up to 2^64-1, 64 is only 2^6. So yes, there is a limit, but if you need more than 64-its worth of flags, I'd be very interested to know what they were all doing :)

How many states so you need to potentially think about? If you have 64 potential states, the number of combinations they can exist in is the full size of a 64-bit integer.

If you need to worry about 128 flags, then a pair of bit vectors would suffice (2^64 * 2).

Addition: in Programming Pearls, there is an extended discussion of using a bit array of length 10^7, implemented in integers (for holding used 800 numbers) - it's very fast, and very appropriate for the task described in that chapter.

warren
yeah, i meant "64 flags" (2 ^ 64), not "64 combinations" (2 ^ 6).
nickf
I figured that's what you meant, but wanted to make the clarification in my answer :)
warren
A: 

For example .NET uses array of integers as an internal storage for their BitArray class. Practically there's no other way around.

That being said, in SQL you will need more than one column (or use the BLOBS) to store all the states.

arul
+2  A: 

Off the top of my head, I'd write a set_bit and get_bit function that could take an array of bytes and a bit offset in the array, and use some bit-twiddling to set/get the appropriate bit in the array. Something like this (in C, but hopefully you get the idea):

// sets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// result is 0 on success, non-zero on failure (offset out-of-bounds)
int set_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
  // make sure offset is valid
  if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }

  //set the right bit
  bytes[offset >> 3] |= (1 << (offset & 0x7));

  return 0; //success 
}

//gets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// returns (-1) on error, 0 if bit is "off", positive number if "on"
int get_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
  // make sure offset is valid
  if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }

  //get the right bit
  return (bytes[offset >> 3] & (1 << (offset & 0x7));
}
Mike Spross
+1  A: 

Some languages ( I believe perl does, not sure ) permit bitwise arithmetic on strings. Giving you a much greater effective range. ( (strlen * 8bit chars ) combinations )

However, I wouldn't use a single value for superimposition of more than one /type/ of data. The basic r/w/x triplet of 3-bit ints would probably be the upper "practical" limit, not for space efficiency reasons, but for practical development reasons.

( Php uses this system to control its error-messages, and I have already found that its a bit over-the-top when you have to define values where php's constants are not resident and you have to generate the integer by hand, and to be honest, if chmod didn't support the 'ugo+rwx' style syntax I'd never want to use it because i can never remember the magic numbers )

The instant you have to crack open a constants table to debug code you know you've gone too far.

Kent Fredric
A: 

You tagged this question SQL, so I think you need to consult with the documentation for your database to find the size of an integer. Then subtract one bit for the sign, just to be safe.

Edit: Your comment says you're using MySQL. The documentation for MySQL 5.0 Numeric Types states that the maximum size of a NUMERIC is 64 or 65 digits. That's 212 bits for 64 digits.

Remember that your language of choice has to be able to work with those digits, so you may be limited to a 64-bit integer anyway.

Mark Ransom
yep, the mysql datatype BIGINT is 64 bit. I was wondering what field type to use should you need more than 64 flags.
nickf
A: 

I'd suggest you don't store an int representing various flags in a field and instead use a field for each flag.

I think doing so would negate the advantages of using SQL.

Suppose, for instance, this scenario:

0x01 ==> read

0x02 ==> write

0x04 ==> execute

Now, suppose I want to retrieve all records where execute permission is granted.

You can't just do

WHERE flags = 4

because if a record has read and write permissions too, its value would be 7...

I haven't seen bitwise manipulation in SQL -- maybe it's possible, but I think that even in such case, it'd be rare enough to avoid it, and probably even worse than using char(1) fields performance-wise.

PD: yes, you can do something like

WHERE flags > 3 AND flags < 8,

but then you're not treating it as a bit pattern, which is how flags are approached and understood by almost everyone.

EDIT:

@nickf (Sorry, I don't have enough "reputation" to post a comment)

Fair enough, I'm familiar with bitwise operations, but I hadn't come across them in SQL, so I wasn't sure they were available. Still, I think individual fields are a better option and fit better in a relational DB model.

And, in fact, doing flags & 2 will not match 4 values if you're dealing with an int(32) field, but 2^31 values... (and 2^63 values for a 64 bit int).

Juan Pablo Califano
nickf