views:

1452

answers:

3

I'd like to implement a bloom filter using MySQL (other a suggested alternative).

The problem is as follows:

Suppose I have a table that stores 8 bit integers, with these following values:

1: 10011010
2: 00110101
3: 10010100
4: 00100110
5: 00111011
6: 01101010

I'd like to find all results that are bitwise AND to this:

00011000

The results should be rows 1 and 5.

However, in my problem, they aren't 8 bit integers, but rather n-bit integers. How do I store this, and how do I query? Speed is key.

+2  A: 

Create a table with int field for value, no need to store numbers as sequence of 0 and 1. For you data it will look like this:

number

154
53
148
38
59
106

and you need to find all entries matching 24.

Then you can run a query like

SELECT * FROM test WHERE number & 24 = 24

If you want to avoid convertion into 10 base numbers in your application you can hand it over to mysql:

INSERT INTO test SET number = b'00110101';

and search like this

SELECT bin(number) FROM test WHERE number & b'00011000' = b'00011000'
alexeit
Thank you for the query advice. However what should I do if I want to store "n-bit" numbers that are longer than Integers (32 bits)... for example, 64 or 128 bits?
Sam
Mysql BIT datatype seem to support up to 64 bits.Does it mean you can only store up 64 items in bloom filter?
alexeit
I need to be able to store n-bits... this limits me to 64.
Sam
+2  A: 

Consider not using MySQL for this.

First off, there probably isn't a built-in way for more than 64-bit tables. You'd have to resort to user-defined functions written in C.

Second, each query is going to require a full table scan, because MySQL can't use an index for your query. So, unless your table is very small, this will not be fast.

derobert
+2  A: 

Switch to PostgreSQL and use bit(n)