ansaurus

Question

(Bitwise) Supersets and Subsets in MySQL

Answer 1

+1 A:

I doubt the optimizer would figure that one...

Maybe you can call EXPLAIN on these queries and confirm my pessimistic guess. (remembering of course that much of query plan decisions are based on the specific instance of a given database, i.e. variable amounts of data and/ore merely data with a different statistical profile may produce distinct plans).

Assuming that the table has a significant amount of rows, and that the "bitwised" criteria remain selective enough) a possible optimization is achieved when avoiding a bitwise operation on every single row, by rewriting the query with an IN construct (or with a JOIN)

Something like that (conceptual, i.e. not tested)

CREATE TEMPORARY TABLE tblFieldValues
  (Field INT);

INSERT INTO tblFieldValues
   SELECT DISTINCT Field
   FROM table;

-- SELECT * FROM table WHERE field | number = number; 
-- now becomes
SELECT * 
FROM table t
WHERE field IN 
    (SELECT Field 
     FROM tblFieldValues 
     WHERE field | number = number);

The full benefits of an approach like this need to be evaluated with different use cases (all of which with a sizeable number of rows in table, since otherwise the direct "WHERE field | number = number" approach is efficient enough), but I suspect this could be significantly faster. Further gains can be achieved if the "tblFieldValues" doesn't need to be recreated each time. Efficient creation of this table of course implies an index on Field in the original table.

mjv 2009-09-21 23:27:47

It's a solution that gets more and more interesting with decreasing count of Field's values. I'd consider splitting a BIGINT field into set of TINYINT fields just for the sake of this optimization.

Omeoe 2009-09-29 22:08:12

Answer 2

A:

I've tried this myself, and the bitwise operations are not enough to prevent Mysql from using an index on the "field" column. It is likely, though, that a full scan of the index is taking place.

Rob F 2009-09-28 15:28:48

Answer 3

+2 A:

Update:

See this entry in my blog for performance details:

Bitwise operations and indexes

SELECT * FROM table WHERE field & number = number

SELECT * FROM table WHERE field | number = number

This index can be effective in two ways:

To avoid early table scans (since the value to compare is contained in the index itself)
To limit the range of values examined.

Neither condition in the queries above is sargable, this is the index will not be used for the range scan (with the conditions as they are now).

However, point 1 still holds, and the index can be useful.

If your table contains, say, 100 bytes per row in average, and 1,000,000 records, then the table scan will need to scan 100 Mb of data.

If you have an index (with a 4-byte key, 6-byte row pointer and some internal overhead), the query will need to scan only 10 Mb of data plus additional data from the table if the filter succeeds.

The table scan is more efficient if your condition is not selective (you have high probablility to match the condition).
The index scan is more efficient if your condition is selective (you have low probablility to match the condition).

Both these queries will require scanning the whole index.

But by rewriting the AND query you can benefit from the ranging on the index too.

This condition:

field & number = number

can only match the fields if the highest bits of number set are set in the field too.

And you should just provide this extra condition to the query:

SELECT  *
FROM    table
WHERE   field & number = number
        AND field >= 0xFFFFFFFF & ~((2 << FLOOR(LOG(2, 0xFFFFFFFF & ~number))) - 1)

This will use the range for coarse filtering and the condition for fine filtering.

The more bits for number are unset at the end, the better.

Quassnoi 2009-09-28 15:34:39

There is optimization possible for OR as well: "field | number = number" is never true if field has higher bits set than the hightest of number's bits. So: SELECT * FROM table WHERE field | number = number AND field < 2<<FLOOR(LOG(2,number)); I can also imagine further optimization building queries with more than one range using more than one of field's highest bits. Extreme situation would be when every single value of the field was explicitly specified (using "field IN (value1, ...)"). Anyway, it's a great answer.

Omeoe 2009-09-29 22:17:31

ansaurus

tags:

views:

answers:

(Bitwise) Supersets and Subsets in MySQL

related questions