views:

1070

answers:

6
+4  Q: 

Flags in Python

I'm working with a large matrix (250x250x30 = 1,875,000 cells), and I'd like a way to set an arbitrary number of flags for each cell in this matrix, in some manner that's easy to use and reasonably space efficient.

My original plan was a 250x250x30 list array, where each element was something like: ["FLAG1","FLAG8","FLAG12"]. I then changed it to storing just integers instead: [1,8,12]. These integers are mapped internally by getter/setter functions to the original flag strings. This only uses 250mb with 8 flags per point, which is fine in terms of memory.

My question is: am I missing another obvious way to structure this sort of data?

Thanks all for your suggestions. I ended up rolling a few suggestions into one, sadly I can only pick one answer and have to live with upvoting the others:

EDIT: erm the initial code I had here (using sets as the base element of a 3d numpy array) used A LOT of memory. This new version uses around 500mb when filled with randint(0,2**1000).

import numpy

FLAG1=2**0
FLAG2=2**1
FLAG3=2**2
FLAG4=2**3

(x,y,z) = (250,250,30)

array = numpy.zeros((x,y,z), dtype=object)


def setFlag(location,flag):
    array[location] |= flag
def unsetFlag(location,flag):
    array[location] &= ~flag
+1  A: 

BitSet is what you want, since it allows you to store many flags at once using only an fixed size integer (Int type)

dfa
+7  A: 

Your solution is fine if every single cell is going to have a flag. However if you are working with a sparse dataset where only a small subsection of your cells will have flags what you really want is a dictionary. You would want to set up the dictonary so the key is a tuple for the location of the cell and the value is a list of flags like you have in your solution.

allFlags = {(1,1,1):[1,2,3], (250,250,30):[4,5,6]}

Here we have the 1,1,1 cell have the flags 1,2, and 3 and the cell 250,250,30 have the flags 4,5, and 6

edit- fixed key tuples, thanks Andre, and dictionary syntax.

Robbie
He's using a three dimensional matrix, so you would need so say something like dict((1,1,1)=[1,2,3]...
Andre Miller
Actually, shouldn't that be {(1,1,1):[1,2,3], ...). dict(key=val) syntax only works if your keys are python identifiers.
Brian
+3  A: 

Consider using Flyweight pattern to share cell properties:

http://en.wikipedia.org/wiki/Flyweight_pattern

Perica Zivkovic
+5  A: 

You can define some constants with different, power of two values as:

FLAG1 = 0x01
FLAG8 = 0x02
FLAG12 = 0x04
...

And use them with boolean logic to store the flags in only one integer, p.e.:

flags = FLAG1 | FLAG8

To check if a flag is enabled, you can use the & operator:

flag1_enabled = flags & FLAG1

If the flag is enabled, this expression will return a non-zero value, that will be evaluated as True in any boolean operation. If the flag is disabled, the expression will return 0, that is evaluated as False in boolean operations.

Jaime Soriano
This saves me about 20% of the required memory even if I use up to 100 objects, very interesting! One question, how do I remove a flag?
Alex Jurkiewicz
Maybe not the best approach if you really might have up to 500 flags, although Python does support arbitrarily long integers.
FogleBird
Jaime Soriano
Kylotan
Thanks for this suggestion. Although I'm going to use sets instead of bitfields (I don't need the extra space savings here) I definitely will need something like this in future.
Alex Jurkiewicz
Actually, I am! see sample code in the question, and thanks for the idea.
Alex Jurkiewicz
+4  A: 

I would generally use a numpy array (presumably of short ints, 2 bytes each, since you may need more than 256 distinct values) -- that would take less than 4MB for the <2 million cells.

If for some reason I couldn't afford the numpy dependency (e.g on App Engine, which doesn't support numpy), I'd use the standard library array module - it only supports 1-dimensional arrays, but it's just as space-efficient as numpy for large homogeneous arrays, and the getter/setter routines you mention can perfectly well "linearize" a 3-items tuple that's your natural index into the single integer index into the 1-D array.

In general, consider numpy (or array) any time you have large homogeneous, dense vectors or matrices of numbers -- Python built-in lists are highly wasteful of space in this use case (due to their generality which you're not using and don't need here!-), and saving memory indirectly translates to saving time too (better caching, fewer levels of indirection, etc, etc).

Alex Martelli
Thanks for the numpy suggestion. One question about the data type. Since a ushort can hold values up to 2**16, can't I only have 16 flags if I use ushort?
Alex Jurkiewicz
Sorry... I was still thinking of bitflags as in a previous answer. I see what you mean now, a 4d array, where a[0][0][0] would return a 1d array correct?
Alex Jurkiewicz
Hmm, I had read your question differently, with just a single flag (or none) per cell (with up to 500 different values), as opposed to up to 500 flags for a single cell. For your actual question a 4-D array (with 512 bits per cell, 64 bytes) is indeed needed, so that's 128 MB (and the translation to/from bit by shift and mask, as other answers have mentioned) -- *IF* you do need to be as "dense" as that (if the _typical_ cell has a few flags, then lists are again attractive, esp. in numpy where you only need lists at the lowest level).
Alex Martelli
Thanks for this idea. I've ended up using a 3d numpy array of sets (on the assumption that a set is better suited for this than a list).
Alex Jurkiewicz
erm, sets sucked. Sample code I pasted in the question uses the magic of python long's. Thanks for the numpy array tip.
Alex Jurkiewicz
+1  A: 

Taking Robbie's suggestion one step further...

flags = set()
x, y, flag = 34, 201, 3
flags.add((x, y, flag)) # set flag 3 at position (34, 201)
if (3, 2, 1) in flags: # check if flag 1 is at position (3, 2)
    # do something
else:
    # do something else

You can also create a helper class.

class Flags(object):
    def __init__(self):
        self.data = set()
    def add(self, x, y, flag):
        self.data.add((x, y, flag))
    def remove(self, x, y, flag):
        self.data.remove((x, y, flag))
    def contains(self, x, y, flag):
        return (x, y, flag) in self.data

You could also implement Python's special methods like __contains__ to make it easier to work with.

FogleBird