views:

169

answers:

5

The PROBLEM

I have a query like this:

select a.id from a join b on ( a.id = b.my_a ) join ....
where 
    ( /* complex and expensive conditional */ )
    AND 
       (( /* conditional #1 */ ) 
        OR ( /* conditional #2 */ )
        OR ( /* conditional #3 */))

I would like to have the query return something like:

select a.id, conditional_1_eval_value, conditional_2_eval_value, conditional_3_eval_value from a join b on ( a.id = b.my_a ) join ....
where 
    ( /* complex and expensive conditional */ )
    AND 
       (( /* conditional #1 */ ) 
        OR ( /* conditional #2 */ ) 
        OR ( /* conditional #3 */))

where conditional_1_eval_value, conditional_2_eval_value, and conditional_3_eval_value are set to TRUE, FALSE, NULL. NULL indicating that the conditional was not evaluated.

So the results set might be:

1, FALSE, NULL, TRUE ( condition_1, condition_3 were evaluate, condition_2 was not)
2, NULL, TRUE, TRUE ( condition_2, condition_3 were evaluate, condition_1 was not)
3, TRUE, FALSE, FALSE (all were evaluated)

condition_1, condition_2, condition_3 are complex themselves involving correlated subqueries and grouping.

EDIT:

What am I trying to accomplish?

We need to log which conditional caused the row to be returned. We don't need to know all the reasons why the row was returned. So in the second row of the results example, it is enough to know that conditional_2 and conditional_3 were both true. Not knowing the what conditional_1 value is does not matter.

It is enough to know that at least one conditional was satisfied and what that one conditional was.

Nonoptimal solutions

Obviously I could do this with a UNION like this:

select a.id, TRUE, NULL, NULL from a join b on ( a.id = b.my_a ) join ....
where 
    ( /* complex and expensive conditional */ )
    AND 
       ( /* conditional #1 */ )
UNION
select a.id, NULL, TRUE, NULL from a join b on ( a.id = b.my_a ) join ....
where 
    ( /* complex and expensive conditional */ )
    AND 
        ( /* conditional #2 */ )
UNION
select a.id, NULL, NULL, TRUE from a join b on ( a.id = b.my_a ) join ....
where 
    ( /* complex and expensive conditional */ )
    AND 
        ( /* conditional #3 */)

But this would mean that:

  1. the common "complex and expensive conditional" is evaluated 3 times.
  2. that all conditionals are evaluated even when another conditional has already satisfied the OR.
  3. there would be a maintenance nightmare insuring that the 3 copies of the common complex query are identical ( o.k. solvable by constructing sql in code and copying the common string -- but that means I would violate another internal standard of all sql not being embedded in java but being in a visible-to-DBAs xml file)

Using a CASE in the select that duplicates each conditional 1 through 3 avoids the common condition being evaluated 3 times. However, the complexity of conditional 1-3 is such that it may not be possible.

Using a select in the FROM clause, would be awkward and may not be possible because a FROM SELECT cannot be a correlated query. I am not certain that I can construct a useful noncorrelated query.

Stored procedures would work. However, this would be out first such stored procedure and would increase our deployment complexity significantly.

Doing the conditional_1, conditional_2, conditional_3 evaluation in java code. This is what we are currently doing and it runs sloooooooow. Lots of data transferred when the database is designed to filter the results set -- should not be doing this in java!

Solution suggestions?

Anyone?

I should also add that I welcome answers that say this problem cannot be solved. Knowing that the problem cannot be solved would save me time trying to solve it with strictly SQL.

If I had to chose, I would lean toward learning what the mysql stored procedure would look like.

So if you want to volunteer what the mysql stored procedure would look like that would be great.

A: 

Could you just select all data relative to the results then perform that conditional logic in your client code working off the result set?

Chris Klepeis
No. Conditional_1, Conditional_2, and conditional_3 are non-trivial. Doing this in code is expensive and would result in a lot of data being transferred.
Pat
+1  A: 

Franky, I don't see how I could express the problem in MySQL (or in fact in any SQL). However, I was faced with similarly complicated, large dataset evaluation problem couple of years ago1.

Based on the experiences gathered, I can give some ideas about how I would speed up the evaluation:

  • I would consider switching to another database engine (replacing the current one or just copy the data into it) - I would use Oracle as I know its capabilities in terms of query optimization. Other option would be to use an embedded sql engine to bring the data closer to the place of computation.
  • I would have a second look at the current Java based evaluation. Maybe tuning the query batch sizes, having parallel-entangled queries of various tables involved in the problem and use a streaming based approach.
  • If I had some decent amount of memory available for my Java code, I would consider keeping some/all data cached in memory all the time if the computation needs to be performed frequently.
  • Or would look for a way to take out the common parts of the conditions to gain some speed by sharing it between the conditions.

1 actually the constraint was to have a fast on-demand computation on a ~1M record set.

kd304
No. The conditionals are not variable. But processing of each row changes depending on which of the three conditionals is satisfied.Will edit question.
Pat
We are a startup. The solution has to be achievable in a few hours. Not a few days. Right now it looks like a stored function is leading the way. Naturally I will try to reduce/consolidate the query but I have to operate on the assumption that I will not be able to in the allocated time.Re: Java based evaluation; that is what we are trying to replace.
Pat
A: 

If you're looking to reduce the retyping of complex expressions, put the expressions in a subquery derived table:

SELECT t.*
FROM (
    SELECT a.*, b.*, ...
        /* conditional #1 */ AS c1,
        /* conditional #2 */ AS c2,
        /* conditional #3 */ AS c3
    FROM a JOIN b ON (a.id = b.my_a)
    ...) AS t
WHERE /* ...other conditions... */
    AND ((c1) OR (c2) OR (c3));

This is also a solution to a different question, "how can I use column aliases in WHERE conditions?"

Bill Karwin
you have FROM (SELECT /* conditional #1 */ AS c1, ) ... so what do I replace /* conditional #1 */ with?
Pat
Also as referenced in the question FROM SELECTs must be non-correlated queries. If I can come up with a FROM SELECT query, then that would be my preferred solution.
Pat
+2  A: 

To achieve what your trying to do, why not use Stored Functions for the complex conditions?

Which would result in a select statement like:


select func1(arg1, arg2, ...), func2(arg1, arg2, ...), rest_of_select_columns 
from table1, table2
where (complex1 logic) 
OR func1(arg1, arg2, ....) = 1 /* return to give true */ 
OR func2(arg1, arg2, ....) = 1 

Notes:

  1. SQL does not support a Boolean data type, so true, false as results from the function is not possible. Hence, the 0,1 return values.
  2. Depending on your version of mySQL you could make the functions DETERMINISTIC, which could result in some performance improvements.
Aussie Craig
So these functions would act the same as CASE would conceptually but with better performance? It would avoid the problem of duplicating the conditionals by encapsulating the conditionals in functions.[Stored procedures were listed under non-optimal solutions in the question]
Pat
In part it would act like the case. The use of DETERMINISTIC (if mySQL does it "right") and if the conditions are "deterministic", would result in the query running significantly faster. In Oracle (and I'm assuming mySQL is the same), when the Query Engine "sees" a previously "done" argument list, it substitutes the previous result (saving a lot when the same argument list happens on more than one row).
Aussie Craig
RE: Not wanting to use stored procedures. I'm sorry, but the only way I can see to "skate over the thin ice" of SQL not having a true boolean column type (1=1 as select column is not supported), is to use functions which use numeric return type.
Aussie Craig
Well TRUE FALSE NULL are just example values. "foo", "bar" and "baz" would work just as well. Its the signal that is important- not the form of the signal.
Pat
+1  A: 

You are right that pulling all the data back to java and executing your conditions would be a dog.

However, you're only real choice is to UNION the 3 different queries. Because of how the relational engine works there's no way to pull out what "hit" on the record.

Chris Lively
man, you are not supposed to say that :-)
Pat