ansaurus

Question

Is there a good way to check rules against n columns?

Answer 1

A:

Are there too many rows/rules? If it's not the case (that's subjective, but say less than 10,000), you could create indexes for all columns.

That would increase speed significantly and indexes won't take much space.

If you don't plan to make a huge table of rules, then I bet your approach is ok provided you index all columns.

Seb 2009-04-14 19:54:01

In Oracle, nulls are not indexed. Then the IS NULL cannot take advantage on indexes.

FerranB 2009-04-14 19:55:25

Answer 2

+1 A:

The first query you provided is perfect. I really doubt that adding the column you were speaking of would give you any more speed, since the NOT NULL property of every entry is checked anyway, since every comparison to NULL yields false. So I would guess that x=y is expanded to x IS NOT NULL AND x=y internally. Maybe someone else can clarify that.

All other optimizations I can think of would involve precalculation or caching. You can create [temporary] tables matching certain rules or add further columns holding matching rules.

soulmerge 2009-04-14 19:55:00

Answer 3

A:

Why not make indices of your rules table by the values? Then you can

SELECT myvalue FROM RULES_A

Charlie Martin 2009-04-14 19:57:42

So, you are saying select against 3 separate tables rather than a single table? (Or 15 tables as in the actual case)

jmucchiello 2009-04-14 20:14:22

Answer 4

A:

It sounds like what you really have are Rules and Rule Sets. Modeling it that way will not only make this particular coding much simpler, but will also make the model expandable when you decide that you need a 16 columns.

For example:

CREATE TABLE Rules (
    rule_id   INT   NOT NULL,
    rule_category CHAR(1)  NOT NULL, -- This is like your column idea
    rule_int_value INT   NULL,
    rule_str_value VARCHAR(20) NULL,
    CONSTRAINT PK_Rules PRIMARY KEY CLUSTERED (rule_id),
    CONSTRAINT CK_Rules_one_value CHECK (rule_int_value IS NULL OR rule_str_value IS NULL)
)

CREATE TABLE Rule_Sets (
    rule_set_id INT NOT NULL,
    rule_id  INT NOT NULL,
    CONSTRAINT PK_Rule_Sets PRIMARY KEY CLUSTERED (rule_set_id, rule_id)
)

Some data that would match your given rules:

INSERT INTO Rules (rule_id, rule_category, rule_int_value, rule_str_value)
VALUES (1, 'A', 50, NULL)
INSERT INTO Rules (rule_id, rule_category, rule_int_value, rule_str_value)
VALUES (2, 'A', 51, NULL)
INSERT INTO Rules (rule_id, rule_category, rule_int_value, rule_str_value)
VALUES (3, 'B', NULL, 'xyz')
INSERT INTO Rules (rule_id, rule_category, rule_int_value, rule_str_value)
VALUES (4, 'C', 123, NULL)
INSERT INTO Rules (rule_id, rule_category, rule_int_value, rule_str_value)
VALUES (5, 'C', 456, NULL)

INSERT INTO Rule_Sets (rule_set_id, rule_id) VALUES (1, 1)
INSERT INTO Rule_Sets (rule_set_id, rule_id) VALUES (2, 2)
INSERT INTO Rule_Sets (rule_set_id, rule_id) VALUES (2, 3)
INSERT INTO Rule_Sets (rule_set_id, rule_id) VALUES (3, 2)
INSERT INTO Rule_Sets (rule_set_id, rule_id) VALUES (3, 4)
INSERT INTO Rule_Sets (rule_set_id, rule_id) VALUES (4, 3)
INSERT INTO Rule_Sets (rule_set_id, rule_id) VALUES (4, 5)

A test script that confirms the same answer that you expect:

DECLARE
    @a INT,
    @b VARCHAR(20),
    @c INT

SET @a = 50
SET @b = 'xyz'
SET @c = 456

SELECT DISTINCT
    rule_set_id AS failed_rule_set_id
FROM
    Rule_Sets RS
WHERE
    NOT EXISTS (SELECT * FROM Rules R WHERE R.rule_id = RS.rule_id AND @a = R.rule_int_value) AND
    NOT EXISTS (SELECT * FROM Rules R WHERE R.rule_id = RS.rule_id AND @b = R.rule_str_value) AND
    NOT EXISTS (SELECT * FROM Rules R WHERE R.rule_id = RS.rule_id AND @c = R.rule_int_value)

If you can present the input data in a set-based form rather than as individual parameters then the final SQL statement can be more dynamic and wouldn't have to grow as you add additional columns.

Tom H. 2009-04-14 20:29:25

But does this run faster than the "is equal or is null" SQL? In the real model there will be 15 columns, not 3. There are some columns where there are few values and this is useful but there are other columns where the values are probably unique by rule. I'll have to run some models....

jmucchiello 2009-04-14 22:00:57

Actually this doesn't work. It return 2,3. I want 1,4 to be returned as in my example. I still need to wrap it in a "select ... not in ()"

jmucchiello 2009-04-14 22:10:02

I was going by your last statement that you wanted a list of all of the rules for which the data fails to pass. If your database is properly indexed I imagine most of the methods will work ok. The real problem is that your criteria is coming in as many variables instead of a set.

Tom H. 2009-04-15 03:14:48

Answer 5

A:

SELECT * FROM RULES
 WHERE (A = :a OR A IS NULL)
   AND (B = :b OR B IS NULL)
   AND (C = :c OR C IS NULL);

Depending on your RBDMS, this might or might not be more efficient, though not by much:

SELECT * FROM RULES
 WHERE coalesce(A, :a) = :a
   AND coalesce(B, :b) = :b 
   AND coalesce(C, :c) = :c ;

In MySQL (your RBDMS may do this differently), this query allows an index scan rather than a ref_or_null scan, if there is an applicable index. If the index covers all columns, it allows the entire index to be used (and indeed, if the index covers all columns, the index is the table).

With your query, a ref_or_null access is done rather than an index access, and only the first column in a multi-column index is used. With ref_or_null, MySQL has to search the index for matches, then search again for nulls. So we use the index twice, but never use the whole index.

But with coalesce, you have the overhead of executing the coalesce function on each column value. Which is faster probably depends on how many rules you have, how many columns in each row, and the index used, if any.

Whether it's more readable is a matter of opinion.

tpdi 2009-04-14 20:59:49

ansaurus

tags:

views:

answers:

Is there a good way to check rules against n columns?

related questions