tags:

views:

584

answers:

7

How would you go about proving that two queries are functionally equivalent, eg they will always both return the same result set.


As I had a specific query in mind when I was doing this, I ended up doing as @dougman suggested, over about 10% of rows the tables concerned and comparing the results, ensuring there was no out of place results.

+2  A: 

This sounds to me like a an NP complete problem. I'm not sure there is a sure fire way to prove this kind of thing

Rik
A: 

The DBMS vendors have been working on this for a very, very long time. As Rik said, it's probably an intractable problem, but I don't think any formal analysis on the NP-completeness of the problem space has been done.

However, your best bet is to leverage your DBMS as much as possible. All DBMS systems translate SQL into some sort of query plan. You can use this query plan, which is an abstracted version of the query, as a good starting point (the DBMS will do LOTS of optimization, flattening queries into more workable models).

NOTE: modern DBMS use a "cost-based" analyzer which is non-deterministic across statistics updates, so the query planner, over time, may change the query plan for identical queries.

In Oracle (depending on your version), you can tell the optimizer to switch from the cost based analyzer to the deterministic rule based analyzer (this will simplify plan analysis) with a SQL hint, e.g.

SELECT /*+RULE*/ FROM yourtable

The rule-based optimizer has been deprecated since 8i but it still hangs around even thru 10g (I don't know 'bout 11). However, the rule-based analyzer is much less sophisticated: the error rate potentially is much higher.

For further reading of a more generic nature, IBM has been fairly prolific with their query-optimization patents. This one here on a method for converting SQL to an "abstract plan" is a good starting point: http://www.patentstorm.us/patents/7333981.html

Matt Rogish
A: 

You don't.

If you need a high level of confidence that a performance change, for example, hasn't changed the output of a query then test the hell out it.

If you need a really high level of confidence .. then errrm, test it even more.

Massive level's of testing aren't that hard to cobble together for a SQL query. Write a proc which will iterate around a large/complete set of possible paramenters, and call each query with each set of params, and write the outputs to respective tables. Compare the two tables and there you have it.

It's not exactly scientific, which I guess was the OP's question, but I'm not aware of a formal method to prove equivalency.

Michael OShea
+3  A: 

The best you can do is compare the 2 query outputs based on a given set of inputs looking for any differences. To say that they will always return the same results for all inputs really depends on the data.

For Oracle one of the better if not best approaches (very efficient) is here (Ctrl+F Comparing the Contents of Two Tables):
http://www.oracle.com/technology/oramag/oracle/05-jan/o15asktom.html

Which boils down to:

select c1,c2,c3, 
       count(src1) CNT1, 
       count(src2) CNT2
  from (select a.*, 
               1 src1, 
               to_number(null) src2 
          from a
        union all
        select b.*, 
               to_number(null) src1, 
               2 src2 
          from b
       )
group by c1,c2,c3
having count(src1) <> count(src2);
Dougman
A: 

Perhaps you could draw (by hand) out your query and the results using Venn Diagrams, and see if they produce the same diagram. Venn diagrams are good for representing sets of data, and SQL queries work on sets of data. Drawing out a Venn Diagram might help you to visualize if 2 queries are functionally equivalent.

Kibbee
+3  A: 

This is pretty easy to do.

Lets assume your queries are named a and b

a minus b

should give you an empty set. If it does not. then the queries return different sets, and the result set shows you the rows that are different.

then do

b minus a

that should give you an empty set. If it does, then the queries do return the same sets. if it is not empty, then the queries are different in some respect, and the result set shows you the rows that are different.

EvilTeach
In SQL Server, You can use EXCEPT to for this approach- http://msdn.microsoft.com/en-us/library/ms188055%28SQL.90%29.aspx
Russ Cam
This only proves equivalency for that particular set, not for the query in general.
Rik
@rik yep... that is the intent. i doubt anyone can do proofs with the relational calculus
EvilTeach
A: 

This will do the trick. If this query returns zero rows the two queries are returning the same results. As a bonus, it runs as a single query, so you don't have to worry about setting the isolation level so that the data doesn't change between two queries.

select * from ((<query 1> MINUS <query 2>) UNION ALL (<query 2> MINUS <query 1>))

Here's a handy shell script to do this:

#!/bin/sh

CONNSTR=$1
echo query 1, no semicolon, eof to end:; Q1=`cat` 
echo query 2, no semicolon, eof to end:; Q2=`cat`

T="(($Q1 MINUS $Q2) UNION ALL ($Q2 MINUS $Q1));"

echo select 'count(*)' from $T | sqlplus -S -L $CONNSTR
Mark Harrison