ansaurus

Question

Answer 1

+2 A:

This sounds to me like a an NP complete problem. I'm not sure there is a sure fire way to prove this kind of thing

Rik 2008-09-11 15:40:22

Answer 2

A:

The DBMS vendors have been working on this for a very, very long time. As Rik said, it's probably an intractable problem, but I don't think any formal analysis on the NP-completeness of the problem space has been done.

However, your best bet is to leverage your DBMS as much as possible. All DBMS systems translate SQL into some sort of query plan. You can use this query plan, which is an abstracted version of the query, as a good starting point (the DBMS will do LOTS of optimization, flattening queries into more workable models).

NOTE: modern DBMS use a "cost-based" analyzer which is non-deterministic across statistics updates, so the query planner, over time, may change the query plan for identical queries.

In Oracle (depending on your version), you can tell the optimizer to switch from the cost based analyzer to the deterministic rule based analyzer (this will simplify plan analysis) with a SQL hint, e.g.

SELECT /*+RULE*/ FROM yourtable

The rule-based optimizer has been deprecated since 8i but it still hangs around even thru 10g (I don't know 'bout 11). However, the rule-based analyzer is much less sophisticated: the error rate potentially is much higher.

For further reading of a more generic nature, IBM has been fairly prolific with their query-optimization patents. This one here on a method for converting SQL to an "abstract plan" is a good starting point: http://www.patentstorm.us/patents/7333981.html

Matt Rogish 2008-09-11 18:42:58

Answer 3

A:

You don't.

If you need a high level of confidence that a performance change, for example, hasn't changed the output of a query then test the hell out it.

If you need a really high level of confidence .. then errrm, test it even more.

Massive level's of testing aren't that hard to cobble together for a SQL query. Write a proc which will iterate around a large/complete set of possible paramenters, and call each query with each set of params, and write the outputs to respective tables. Compare the two tables and there you have it.

It's not exactly scientific, which I guess was the OP's question, but I'm not aware of a formal method to prove equivalency.

Michael OShea 2008-09-16 18:16:40

Answer 4

+3 A:

The best you can do is compare the 2 query outputs based on a given set of inputs looking for any differences. To say that they will always return the same results for all inputs really depends on the data.

For Oracle one of the better if not best approaches (very efficient) is here (Ctrl+F Comparing the Contents of Two Tables):
http://www.oracle.com/technology/oramag/oracle/05-jan/o15asktom.html

Which boils down to:

select c1,c2,c3, 
       count(src1) CNT1, 
       count(src2) CNT2
  from (select a.*, 
               1 src1, 
               to_number(null) src2 
          from a
        union all
        select b.*, 
               to_number(null) src1, 
               2 src2 
          from b
       )
group by c1,c2,c3
having count(src1) <> count(src2);

Dougman 2008-09-18 15:16:50

Answer 5

A:

Perhaps you could draw (by hand) out your query and the results using Venn Diagrams, and see if they produce the same diagram. Venn diagrams are good for representing sets of data, and SQL queries work on sets of data. Drawing out a Venn Diagram might help you to visualize if 2 queries are functionally equivalent.

Kibbee 2008-09-18 15:21:08

Answer 6

+3 A:

This is pretty easy to do.

Lets assume your queries are named a and b

a minus b

should give you an empty set. If it does not. then the queries return different sets, and the result set shows you the rows that are different.

then do

b minus a

that should give you an empty set. If it does, then the queries do return the same sets. if it is not empty, then the queries are different in some respect, and the result set shows you the rows that are different.

EvilTeach 2008-09-19 01:23:24

In SQL Server, You can use EXCEPT to for this approach- http://msdn.microsoft.com/en-us/library/ms188055%28SQL.90%29.aspx

Russ Cam 2009-10-20 16:39:38

This only proves equivalency for that particular set, not for the query in general.

Rik 2009-10-20 18:25:25

@rik yep... that is the intent. i doubt anyone can do proofs with the relational calculus

EvilTeach 2009-10-21 03:23:06

Answer 7

A:

This will do the trick. If this query returns zero rows the two queries are returning the same results. As a bonus, it runs as a single query, so you don't have to worry about setting the isolation level so that the data doesn't change between two queries.

select * from ((<query 1> MINUS <query 2>) UNION ALL (<query 2> MINUS <query 1>))

Here's a handy shell script to do this:

#!/bin/sh

CONNSTR=$1
echo query 1, no semicolon, eof to end:; Q1=`cat` 
echo query 2, no semicolon, eof to end:; Q2=`cat`

T="(($Q1 MINUS $Q2) UNION ALL ($Q2 MINUS $Q1));"

echo select 'count(*)' from $T | sqlplus -S -L $CONNSTR

Mark Harrison 2010-01-23 04:18:08

ansaurus

tags:

views:

answers:

Proving SQL query equivalency

related questions