views:

170

answers:

4

I have a class of data with a very large number of binary properties--151 (!) to be exact--and I am concerned with how to model this data structurally. Despite the internal efficiencies of storing bit-fields as bytes, my programming spidey senses are tingling at creating a table with 151 bit-fields (in addition to other properties).

There will not be a large number of rows--perhaps 1000 and once sent into production will not change very often.

I've thought of categorizing my data into disjoint subclasses and creating separate tables but splitting the properties in this manner is impracticable and even if possible certainly would not map effectively with the data subclasses. The other issue is I'd like to keep all the data together and avoid field and/or row duplication. I have also considered using some custom binary format but this is not workable as the key field in my data is used as foreign keys in other tables.

Queries will make heavy use of WHERE clauses to extract relevant data. I've considered using multiple longs or int fields but I've rejected this as unworkable since I know of no bit-wise and-operators or functions in SQL and as noted above, classification of the properties is problematic, not to mention other major software engineering issues (with this method).

I will be using PostgreSQL.

So, my question here is do I just make a table with a huge number of fields or are there other methods compatible with the relational model?

+1  A: 

Why couldn't you use bit wise operators?

&   bitwise AND 91 & 15 11
|   bitwise OR 32 | 3 35
#   bitwise XOR 17 # 5 20
~   bitwise NOT ~1 -2

from: http://www.postgresql.org/docs/7.4/static/functions-math.html

I would think that you could maybe group them into smaller groups, but other than doing that I don't know of another way.

rball
I could use them. This is embarrassing.
gvkv
+1  A: 

PostgreSQL supports bits and bit strings natively.

HUAGHAGUAH
So it does. Thanks.
gvkv
+2  A: 

The biggest problem I see is the obvious fact that the cardinality of single-field indexes is, to say the least, low. Maybe you can describe the data a bit more and we can discuss other designs? Are all these independent of each other, for instance?

With only 1000 rows, it might be simpler to store this elsewhere than the database (although I imagine there are lots of join opportunities?) Not for query efficiency reasons, but it doesn't really look like database data.

le dorfier
+1. agree that a DB might not be the best place for this data. Bit-wise testing using appropriate masks would seem a better fit.
Mitch Wheat
That was actually my original plan but I need my key fields as foreign keys in other tables. Anyway, since bit-wise operators are supported the point is now moot. My structure becomes obvious.
gvkv
Hmmm... Your key values are just as useful no matter where you get them from. And I don't understand " ... since bitwise operators are supported ... " Do you mean because now you can, you must? Pardon, but I'm not following your argument. But I'll take your word that your structure becomes obvious.
le dorfier
My primary data field is involved in join tables (for many-to-many relationships). If this data is not in the database, how then can I use them? The reason you're not following my argument is because I haven't made one! It's just that I now feel confident using my original plan.
gvkv
Clarification: presumably any boolean selection mechanism will give you keys for use as "foreign" keys into your other tables. But good luck! (With only 1K records indexes and cardinality will hopefully be inconsequential.)
le dorfier
+1  A: 

Model your data most appropriate for your problem domain. You don't have much data here, in a worst case scenario, assuming each row takes up 200 bytes, you're looking at less than 200 Kb of data. It's a trivial amount even if your particular database doesn't implement Boolean properties in an efficient way.

On the other hand, having 150 boolean properties sounds somewhat suspicious, perhaps your data model can be further normalized?

Scott Weinstein
Size is not my concern though I agree 150 properties is suspicious. Normalizing the bit-fields and creating an extra join table is not worthwhile since the bit-fields have no other properties. Anyway, my resolved ignorance of bit-wise operators renders my question null.
gvkv