views:

424

answers:

5

I want to create an object in python that is a collection of around 200,000,000 true/false values. So that I can most effectively change or recall any given true/false value, so that I can quickly determine if any given number, like 123,456,000 is true or false or change its value.

Is the best way to do this a list? or an array? or a class? or just a long int using bit operations? or something else?

I'm a bit noob so you may have to spell things out for me more than if I were asking the question in one of the other languages I know better. Please give me examples of how operating on this object would look.

Thanks

+3  A: 

Have you considered using a lightweight database like SQLite?

jldupont
+1. 200 million bits is about 24 megabytes of data -- while this could easily fit in memory on a modern machine, anytime you get to that size of structure in memory, you probably need to at least consider whether a database would be a better solution.
Daniel Pryden
+11  A: 

You can try the bitarray module, or write a similar thing using an array of integers yourself.

Lukáš Lalinský
Thanks! I've never installed a Module before and am having a bit of trouble: http://superuser.com/questions/56316/python-module-trouble-installing-bitarray
Dan
+5  A: 

"quickly determine if any given number, like 123,456,000 is" in the "true" set or "false" set.

This is what a set is for.

The "true" set is a set of all the numbers.

To make a number's boolean flag "true", add it to the true set.

To make a number's boolean flag "false", remove it from the true set.

Life will be much simpler.

S.Lott
+1: it is simple to use and might be sufficiently efficient for a sparse list
J.F. Sebastian
Let's say half of the values are true. Size of the int object is 12 bytes, that's 1.2GB just for storing the keys + additional memory for the actual hash table. Using a bit array, the memory usage will be 25MB. I think that's a significant difference.
Lukáš Lalinský
@Lukáš Lalinský: Your analysis is good. However, I don't think it's relevant unless your processor has no memory available. On most modern processors, there's plenty of memory and the 25M vs. 1.2G doesn't really matter very much at all.
S.Lott
Well, I have trouble calling wasting 1GB of RAM the best way to do something. :) Sure, it's a simple way and it works well if you have much less true values than false values, but it's not the best way.
Lukáš Lalinský
Since the RAM is not a consumable resource, I can't apply "waste" to it -- unless you're tearing it out of the processor and throwing it away. Memory is a resource, like time that is part of the trade-off equation. Memory -- in this case -- may be used so that less time is taken trying to manage individual bits in a bit array.
S.Lott
+1  A: 

At first glance, the Python BitVector module sounds like it does exactly what you want. It's available at http://cobweb.ecn.purdue.edu/~kak/dist/BitVector-1.5.1.html and since it is pure Python code, it will run on any platform with no compiling needed.

You mentioned that you need some speed in getting and setting any arbitrary true-false value. For that you need to use a Python array, rather than a list, and if you go to the above URL and browse the source code for BitVector you can see that it does indeed rely on Python arrays.

Ideally, you would encapsulate what you are doing in a class that subclasses from BitVector, i.e.

class TFValues(BitVector):
   pass

That way you can do things like add a list to contain associated info such as the name of a particular TF value.

Michael Dillon
+3  A: 

You might also like to try the bitstring module, which is pure Python. Internally it's all stored as a byte array and the bit masking and shifting is done for you:

from bitstring import BitString
# Initialise with two hundred million zero bits
s = BitString(200000000)
# Set a few bits to 1
s.set(1, [76, 33, 123456000])
# And test them
if s.all([33, 76, 123456000]):
    pass

The other posters are correct though that a simple set might be a better solution to your particular problem.

Scott Griffiths