tags:

views:

1090

answers:

15

Does anyone have a method to understand complex SQL statements? When reading structural / OO code there are usually layers of abstraction that help you break it down into manageable chunks. Often in SQL, though, it seems that you have to keep track of what's going on in multiple parts of a query all at the same time.

The impetus for this question is the SQL query discussed in this question about a complex join. After staring at the answer queries for a number of minutes I finally decided to step through the query using particular records to see what was going on. That was the only way I could think of to understand the query piece by piece.

Is there a better way to break a SQL query down into manageable pieces?

+7  A: 

Mostly it's just experience and proper indenting.

Stephane Grenier
+2  A: 

I guess it all depends on experience. I didn't find the queries in that questions to be very complicated, perhaps since most of the queries I run are more complex than those.

Proper coding standards certainly helps understanding queries, as it allows to break it into visually smaller and better formatted chunks. When subqueries are involved, it is better to understand what those do first and use that understanding when looking at the complete query.

Eran Galperin
+7  A: 

Here's a procedure to follow to unravel a query.

  1. First I format the SQL.
  2. Then I comment out all parts of the SQL other than the basic parts of the most primary or most important table to answer the question.
  3. Then I will start uncommenting the joins, select columns, groupings, order fields, & filters to issolate different parts of the query to see what is happening. Or highlighted-execution works in some tools.
  4. Subqueries can usually be executed independently.

Executing each of these usually allows me to get a better grip on what is happening in the query.

John MacIntyre
@le dorfier-Thanks for reformatting ... it's a lot easier to read now.
John MacIntyre
It was the best available answer describing my method too.
le dorfier
A: 

The use of CTEs or derived tables (in MS SQL at least) can be useful to format a SQL statement without splitting it into separate queries using temp tables to "join" them.

I agree with others that the mentioned queries are quite simple.

I look at at c# and wonder why you have so many lines to simply process a few thousand rows...

gbn
Warning about CTEs - try writing the query without them first. Many seem to end up adding confusion, and they are not particularly portable. It's easy to use one if the final working query can be simplified by using it.
le dorfier
+1  A: 

You're doing what I do. My first tool for making a query comprehensible is good visual organization, which people in the question you reference are mostly doing, and testing in manageable chunks using LIMIT clauses. If non-correlated subqueries are involved, they can be run separately, of course. If there's a magic bullet, though, I don't know about it.

chaos
+1  A: 

If these are giving you pause, I'd suggest writing out the tables on paper to get a better feel for what it means to join things together.

Suppose for example that you have a table for Books and a table for Prices. The Prices table may have multiple entries for each book (since the price can change).

If you want to get a list of the current books and prices, you have to join the two tables together.

I'd work through this on paper by drawing arrows between each book and its corresponding "current" price. Then I'd write that into logic which would become part of the join condition or subquery.

Once you get the hang of it, the complex queries get easier to parse.

Michael Haren
+9  A: 

These may be some helpful hints..

  • Comments - figure out what a small chunk does and comment it so you understand it when you refer back to it later.
  • Syntax highlighting - make sure you're viewing code with something that will color-code the query.
  • Indentation - reorganize the query to make sense for you.. tab things over, add carriage returns.

For example:

select ID, Description, Status from ABC where Status = 1 OR Status = 3

could be better written as:

select 
  ID,
  Description,
  Status
from ABC
where
  Status = 1 OR
  Status = 3

with a more complex query, you'd see a much bigger benefit.

Kon
+4  A: 

formatting helps, but understanding set theory and by extension, relational theory, helps even more.

a vague understanding of how queries are executed won't hurt either (table scans, index scans, index jumps, hash-table merges, etc.); the query planner can show you these operations

a few of the operations (having, exists, with) can be troublesome at first

understand first what happens to each table, and how the tables are joined

Steven A. Lowe
I just cargo-culted a new view based on an old one, and I'm using a "OVER (PARTITION BY" that I wrote, and which works, but I have no idea how or why it works.
Paul Tomblin
yeah, I do the same with CTEs and PIVOTs. one of these days I'll figure them out.
Jimmy
Nothing like those proprietary language extensions to obfuscate things. And programmers who use them to prove they can.
le dorfier
These are all parts of the SQL standard and not proprietary extensions.
Jeremiah Peschka
standard, but obscure ;-)
Steven A. Lowe
A: 

If you're using PostgreSQL, view encapsulation is wonderful for this.

Kev
+1  A: 

Indentation and comments help a lot. The most valuable thing I have run into is the WITH statement. It is in Oracle, and deals with subquery refactoring. It allows you to break a large query, into a set of seemingly smaller ones. Each just a bit more manageable.

Here is an example

WITH 
txssnInfo AS
(
    SELECT SSN, 
           UPPER(LAST_NAME), 
           UPPER(FIRST_NAME), 
           TAXABLE_INCOME,          
           CHARITABLE_DONATIONS
    FROM IRS_MASTER_FILE
    WHERE STATE = 'TX'                 AND -- limit to texas
          TAXABLE_INCOME > 250000      AND -- is rich 
          CHARITABLE_DONATIONS > 5000      -- might donate too

),
doltishApplicants AS
(
    SELECT SSN, 
           SAT_SCORE,
           SUBMISSION_DATE
    FROM COLLEGE_ADMISSIONS
    WHERE SAT_SCORE < 100          -- Not as smart as some others.
),
todaysAdmissions AS
(
    SELECT doltishApplicants.SSN, 
           TRUNC(SUBMISSION_DATE)  SUBMIT_DATE, 
           LAST_NAME, FIRST_NAME, 
           TAXABLE_INCOME
    FROM txssnInfo,
         doltishApplicants
    WHERE ssnInfo.SSN = aidApplicants.SSN

)
SELECT 'Dear ' || FIRST_NAME || 
       ' your admission to WhatsaMattaU has been accepted.'
FROM todaysAdmissions
WHERE SUBMIT_DATE = TRUNC(SYSDATE)    -- For stuff received today only
;

The same thing can be done with inline views, but the with also has the ability to create temporary tables when needed. In some of the cases, you can copy out the subquery and execute it, outside the context of the large query.

This form also allows you to put the filter clauses with the individual subquery, and save the joining clauses for the final select.

At work, our development group generally finds them easier to maintain, and frequently faster.

EvilTeach
The WITH clause is supported by SQL Server 2005 and SQL Server 2008 as well.
Registered User
+1  A: 

Another importan is use standard join syntax:

SELECT A 
  FROM B
  JOIN C ON B.ID = C.ID
 WHERE C.X = 1

Instead of

SELECT A 
  FROM B
     , C 
 WHERE B.ID = C.ID 
   AND C.X = 1
FerranB
+1  A: 

As with anything, the BEST way is to write lots of complex SQL statements yourself. Eventually the general way things are structured becomes apparent. Of course, if you're looking for something quick that probably isn't the way.

White space is very important. A query that looks incredibly complex can look almost simplistic when the proper white space is present.

As to the joins... Well, I'm sorry but I can't be very helpful here, because my answer is that the best way to understand a particular join is to understand how joins work in general. Each type of join serves a very specific purpose and if you know how they work, there shouldn't really be much of a difference from joining x to y, x to y to z, or x and y to a and b.

What may help more immediately, however, is knowing that you need to look at the innermost pieces first. As opposed to code where you're probably used to looking at things on the grand scale then digging into the granularity, with a query it's more helpful and easy to understand if you start with the granularity and work your way outward.

Start with any subqueries, figure out what they're doing in individual pieces treating it as a single query (if possible) then gradually move out step by step until you're at the top. Once again, on the joins... Really, just go find a web page that explains joins and do some tests until you fully understand them. There's not really a way to make that easier, as once you understand them you can pretty much figure out anything with joins that you want.

PhoenixRedeemer
+8  A: 

When I look at a complex bit of SQL Code, this is what I do.

First if is an update or delete, I add code (if it isn't there and commented out) to make it a select. Never try an update or delete for the first time without seeing the results in a select first. If it is an update, I make sure the select shows the current value and what I will be setting it to in order to make I'm getting the desired result.

Understanding the joins is critical to understanding complex SQL. For every join I ask myself why is this here? There are three basic reasons. You need a column for the select, you need a field for the where clause or you need the join as a bridge to a third table. If it is a left or right join (I tend to rewrite so everything is a left join though makes life simpler.), I consider whether an inner join would work. Why do I need a left join? If I don't know the answer, I will run it both ways and see what the difference is inthe data. If there are derived tables, I will look at those first (running just that part of the select to see what the result is) to understand why it is there. If there are subqueries, I will try to understand them and if they are slow will try to convert to a derived table instead as those are often much faster.

Then I look at the where clauses. This is one place where a solid foundation in your particular database will come in handy. For instance, I know in my databases what occasions I might need to see only the mailing address and what occasions I might need to see other addresses. This helps me to know if something is missing from the where clause. Other wise I consider each item in the where and figure out why it would need to be there, then consider if there is anything missing that should be there. After looking it over, I consider if I can make adjustments to make the query sargable.

I also consider any complex bits of the select list next. What does that case statement do? Why is there a subquery? What do those functions do? (I always look up the function code for any fuinction I'm not already familiar with.) Why is there a distinct? Can it be gotten rid of by using a derived table or aggregate function and group by statements?

Finally and MOST important, I run the select and determine if the results look correct based on my knowledge of the business. If you don't understand your business, you won't know if the query is correct. Syntactically correct doesn't mean the right results. Often there is a part of your existing user interface that you can use as a guide to whether your results are correct. If I have a screen that shows the orders for a customer and I'm doing a report that includes the customer orders, I might spot check a few individual customers to make sure it is showing the right result.

If the current query is filtering incorrectly, I will remove bits of it to find out what is getting rid of the records I don't want or adding ones I don't want. Often you will find that the join is one to many and you need one to one (use a derived table in this case!) or you will find that some piece of information that you think you need in the where clause isn't true for all the data you need or that some piece of the where clasue is missing. It helps to have all the fields in the where clause (if they weren't in the select already) in the select at the time you do this. It may even help to show all the fields from all the joined tables (please use nolock if doing this!) and really look at the data. When I do this, I often add a small bit to the where clause to grab just some of the records that I have that shouldn't be there rather than all the records.

HLGEM
+1 - covered the key points I would have mentioned for this question.
ConcernedOfTunbridgeWells
+1, this covered everything I would have said.
Jeremiah Peschka
A: 

I break it down into smaller queries (that's why I like sub-queries more than JOINs)

Sometimes I even save the results of the sub-query as a table and use that in the main query. It's somewhat like simplifying a code expression by saving bits into local variables and then operating on the local variables in the next part of the expression.

I am fanatic about always using table aliases (e.g. CLIENT_HISTORY_ITEM T1) and parentheses around criteria expressions. I often change the table alias number by ten or so for each part of the query so I can see what is coming from where:

SELECT T1.ID FROM TABLE1 T1 WHERE T1.DATE = (SELECT MAX(T11.DATE) FROM TABLE1 T11 WHERE (T1.AREA = T11.AREA))

Cheers

Richard Haven
+1  A: 

Forget all that stuff.

Paste the sql into a tool that will draw pictures & vice versa. I used to even start all my queries like this in the MS Access days and it did a fabulous job. Most queries I could make in 1/2 the time to type them (yes I timed a few), and that doesn't include debugging the typos, etc. I would have made. Then, I just change what I need to paste it in its final destination

I know this is backwards from the question, but, of course, if you have available the database against which the query must run, and the full query, you can just paste it in (replacing,e.g., parameters with sample values, etc). Then switch to design mode. Hopefully, of course, you've got a better tool than access to work with!

FastAl