views:

108

answers:

4

Say you had a long array of chars that are either 1 or 0, kind of like a bitvector, but on a database column. How would you query to know what values are set/no set? Say you need to know if the char 500 and char 1500 are "true" or not.

+3  A: 

In MySQL, something using substring like

select foo from bar 
where substring(col, 500,1)='1' and substring(col, 1500,1)='1';

This will be pretty inefficient though, you might want to rethink your schema. For example, you could store each bit separately to tradeoff space for speed...

create table foo
(
   id int not null,
   bar varchar(128),
   primary key(id)
);

create table foobit
(
   int foo_id int not null,
   int idx int not null,
   value tinyint not null,

   primary key(foo_id,idx),
   index(idx,value)
);

Which would be queried

   select foo.bar from foo
   inner join foobit as bit500
      on(foo.id=bit500.foo_id and bit500.idx=500)
   inner join foobit as bit1500
      on(foo.id=bit1500.foo_id and bit1500.idx=1500)
   where
      bit500.value=1 and bit1500.value=1;

Obviously consumes more storage, but should be faster for those query operations as an index will be used.

Paul Dixon
+2  A: 

I would convert the column to multiple bit-columns and rewrite the relevant code - Bit masks are so much faster than string comparisons. But if you can't do that, you must use db-specific functions. Regular expressions could be an option

-- Flavor: MySql
SELECT * FROM table WHERE column REGEXP "^.{499}1.{999}1"
soulmerge
+6  A: 
SELECT
  Id
FROM
  BitVectorTable
WHERE
  SUBSTRING(BitVector, 500, 1) = '1'
  AND SUBSTRING(BitVector, 1000, 1) = '1'

No index can be used for this kind of query, though. When you have many rows, this will get slow very quickly.

Edit: On SQL Server at least, all built-in string functions are deterministic. That means you could look into the possibility to make computed columns based on the SUBSTRING() results for the whole combined value, putting an index on each of them. Inserts will be slower, table size will increase, but searches will be really fast.

SELECT
  Id
FROM
  BitVectorTable
WHERE
  BitVector_0500 = '1'
  AND BitVector_1000 = '1'

Edit #2: The limits for SQL Server are:

  • 1,024 columns per normal table
  • 30.000 columns per "wide" table
Tomalak
+1  A: 
select substring(your_col, 500,1) as char500,
substring(your_col, 1500,1) as char1500 from your_table;
tehvan