ansaurus

Question

Flags in Python

Answer 1

+1 A:

BitSet is what you want, since it allows you to store many flags at once using only an fixed size integer (Int type)

dfa 2009-06-29 13:53:34

Answer 2

+7 A:

Your solution is fine if every single cell is going to have a flag. However if you are working with a sparse dataset where only a small subsection of your cells will have flags what you really want is a dictionary. You would want to set up the dictonary so the key is a tuple for the location of the cell and the value is a list of flags like you have in your solution.

allFlags = {(1,1,1):[1,2,3], (250,250,30):[4,5,6]}

Here we have the 1,1,1 cell have the flags 1,2, and 3 and the cell 250,250,30 have the flags 4,5, and 6

edit- fixed key tuples, thanks Andre, and dictionary syntax.

Robbie 2009-06-29 13:56:34

He's using a three dimensional matrix, so you would need so say something like dict((1,1,1)=[1,2,3]...

Andre Miller 2009-06-29 14:07:38

Actually, shouldn't that be {(1,1,1):[1,2,3], ...). dict(key=val) syntax only works if your keys are python identifiers.

Brian 2009-06-29 17:52:18

Answer 3

+3 A:

Consider using Flyweight pattern to share cell properties:

http://en.wikipedia.org/wiki/Flyweight_pattern

Perica Zivkovic 2009-06-29 14:00:25

Answer 4

+5 A:

You can define some constants with different, power of two values as:

FLAG1 = 0x01
FLAG8 = 0x02
FLAG12 = 0x04
...

And use them with boolean logic to store the flags in only one integer, p.e.:

flags = FLAG1 | FLAG8

To check if a flag is enabled, you can use the & operator:

flag1_enabled = flags & FLAG1

If the flag is enabled, this expression will return a non-zero value, that will be evaluated as True in any boolean operation. If the flag is disabled, the expression will return 0, that is evaluated as False in boolean operations.

Jaime Soriano 2009-06-29 14:04:42

This saves me about 20% of the required memory even if I use up to 100 objects, very interesting! One question, how do I remove a flag?

Alex Jurkiewicz 2009-06-29 14:11:53

Maybe not the best approach if you really might have up to 500 flags, although Python does support arbitrarily long integers.

FogleBird 2009-06-29 14:15:03

Jaime Soriano 2009-06-29 14:45:07

Kylotan 2009-06-29 14:55:52

Thanks for this suggestion. Although I'm going to use sets instead of bitfields (I don't need the extra space savings here) I definitely will need something like this in future.

Alex Jurkiewicz 2009-06-29 15:07:46

Actually, I am! see sample code in the question, and thanks for the idea.

Alex Jurkiewicz 2009-06-29 15:30:50

Answer 5

+4 A:

I would generally use a numpy array (presumably of short ints, 2 bytes each, since you may need more than 256 distinct values) -- that would take less than 4MB for the <2 million cells.

If for some reason I couldn't afford the numpy dependency (e.g on App Engine, which doesn't support numpy), I'd use the standard library array module - it only supports 1-dimensional arrays, but it's just as space-efficient as numpy for large homogeneous arrays, and the getter/setter routines you mention can perfectly well "linearize" a 3-items tuple that's your natural index into the single integer index into the 1-D array.

In general, consider numpy (or array) any time you have large homogeneous, dense vectors or matrices of numbers -- Python built-in lists are highly wasteful of space in this use case (due to their generality which you're not using and don't need here!-), and saving memory indirectly translates to saving time too (better caching, fewer levels of indirection, etc, etc).

Alex Martelli 2009-06-29 14:09:34

Thanks for the numpy suggestion. One question about the data type. Since a ushort can hold values up to 2**16, can't I only have 16 flags if I use ushort?

Alex Jurkiewicz 2009-06-29 14:22:42

Sorry... I was still thinking of bitflags as in a previous answer. I see what you mean now, a 4d array, where a[0][0][0] would return a 1d array correct?

Alex Jurkiewicz 2009-06-29 14:23:49

Hmm, I had read your question differently, with just a single flag (or none) per cell (with up to 500 different values), as opposed to up to 500 flags for a single cell. For your actual question a 4-D array (with 512 bits per cell, 64 bytes) is indeed needed, so that's 128 MB (and the translation to/from bit by shift and mask, as other answers have mentioned) -- *IF* you do need to be as "dense" as that (if the _typical_ cell has a few flags, then lists are again attractive, esp. in numpy where you only need lists at the lowest level).

Alex Martelli 2009-06-29 14:44:42

Thanks for this idea. I've ended up using a 3d numpy array of sets (on the assumption that a set is better suited for this than a list).

Alex Jurkiewicz 2009-06-29 15:10:32

erm, sets sucked. Sample code I pasted in the question uses the magic of python long's. Thanks for the numpy array tip.

Alex Jurkiewicz 2009-06-29 15:31:44

Answer 6

+1 A:

Taking Robbie's suggestion one step further...

flags = set()
x, y, flag = 34, 201, 3
flags.add((x, y, flag)) # set flag 3 at position (34, 201)
if (3, 2, 1) in flags: # check if flag 1 is at position (3, 2)
    # do something
else:
    # do something else

You can also create a helper class.

class Flags(object):
    def __init__(self):
        self.data = set()
    def add(self, x, y, flag):
        self.data.add((x, y, flag))
    def remove(self, x, y, flag):
        self.data.remove((x, y, flag))
    def contains(self, x, y, flag):
        return (x, y, flag) in self.data

You could also implement Python's special methods like __contains__ to make it easier to work with.

FogleBird 2009-06-29 14:13:39

ansaurus

tags:

views:

answers:

Flags in Python

related questions