views:

71

answers:

5

Hi all, new Stacker here!

From a client I receive SQL queries that contain all sorts of redundant sub-SELECTs, generated by a builder in MS Access. I convert the SQL to Postgres, and it runs - but I know it's hopelessly inefficient. So, I set about optimising the query, and produce iterative improvements - and to determine whether each new query is functionally identical, I run it and ensure it produces the same resultset.

This, of course, is not 100% effective at determining whether one query is the same as another - it relies on the data exploring every different case. A colleague suggests I could run an explain plan over each query, but concedes that the internal db optimiser may employ a different query strategy even though the newer query is functionally identical (and this is a good thing - the new strategy may be much more efficient).

I have at least intermediate SQL skills, but am not a confident query optimiser. So, I'd be interested in any toolset for Postgres that can say with 100% certainty that two queries do the same thing. Even better if it can suggest optimisations itself! I believe that TOAD can do this, but I don't believe it is available for Postgres. I'm on Mac OS X.

Thanks all in advance :)

A: 

You could try using TOAD on MySQL to analyze the query. Once you improve the query, port it over to PostgreSQL.

Bill Karwin
That's a great idea - thanks - and I'm impressed that it appears to be free(ware). However it appears to be Win only - so it's out for me on OS X.
halfer
Unless you run Windows on your Mac under VMware or Parallels.
Bill Karwin
That's a good point. But it would require a license... I do have some spare XP licenses I suppose, but it's a lot of effort for a simple tool!
halfer
I think some wine would help here: http://www.winehq.org/
Tometzky
A: 

I would say no. However queries from access should not perform worse in postgres given proper indexes.

You might percieve the queries as slower as msaccess normally returns partial results and only on browsing the dataset continues to execute the query where postgres would normally run the whole query (depending of course on the client you use).

Unreason
Thanks for the reply. However the issue is not whether a query in Access is faster/slower than the same in Postgres. The problem is that, once I have converted a query to Postgres, I +know+ it's inefficient. My task is to improve it whilst being certain that I've not changed the meaning of the query.
halfer
Addendum to the last comment: put another way, I am comparing two queries on Postgres, not one on each platform.
halfer
A: 

Assuming your are improving a PostgreSQL query that is of course already written for PostgreSQL use pgTap (Unit Testing for PostgreSQL), no question about it. You can test for how long a query runs and the results it returns among other things. http://pgtap.org/

StarShip3000
Thanks for the tip. However if I set up unit tests in the form of "does select X on known database Y produce results Z", then I need to consider all the edge cases and add data accordingly. With a set of tests, I therefore cannot be _certain_ that two queries are functionally identical. I guess what I am after is some software that will examine the PK/FK joins and perform the necessary relational mathematics to _prove_ that two queries do the same thing regardless of the dataset. Nevertheless, I will still have a read on that link!
halfer
Out of interest, at the moment my strategy is to ensure that all queries sort the same way, and generate plenty of results. I then pipe each query in psql and then into shasum, and presume that the query meaning is unchanged if the checksum remains static.
halfer
A: 
Tometzky
I will let the boss know! We're on 8.4.4, but I'm too new to PostgreSQL to know whether the new version would remove the need for manual optimisation (and my general realism/cynicism suggests that would be wishful thinking ;-).
halfer
+1  A: 

Great question. Tall order.

As you know there's the level of being logically identical. Then there is the level of generating identical plans as far as which table accesses are required and in what order.

Lastly those underlying plans are compared to the distribution of data in the table to determine whether or not to use a given index on a table, and the use of cached data pages comes into play as well. (30,000 foot description, I know)

If your question is limited to 'perform identically on the same dataset', then comparing EXPLAIN plans would be sufficient, and would encompass both logical identicality and some level of I/O identicality.

Dean Radcliffe
Yes, I thought it would be interesting not to limit the question to a specific dataset - afaict a dataset is not required. However even for a static dataset, I think explain plans could be so different as to make comparisons difficult. Consider (a) EXPLAIN SELECT * FROM users WHERE workgrp = '123' UNION SELECT * FROM users WHERE workgrp = '321' with (b) EXPLAIN SELECT * FROM users WHERE workgrp IN ( '123', '321' ). The query is functionally the same, but the plan is very different. That said, I am on 8.4.4 - perhaps Tometzky's point about version 9 would resolve this? :-)
halfer