views:

96

answers:

3

I find that when trying to construct complex MySQL joins and groups between many tables I usually run into strife and have to spend a lot of 'trial and error' time to get the result I want.

I was wondering how other people approach the problems. Do you isolate the smaller blocks of data at the end of the branches and get these working first? Or do you start with what you want to return and just start linking tables on as you need them?

Also wondering if there are any good books or sites about approaching the problem.

A: 

I haven't used them myself so can't comment on their effectiveness, but perhaps a GUI based query builder such as dbForge or Code Factory might help?

And while the use of Venn diagrams to think about MySQL joins doesn't necessarily help with the SQL, they can help visualise the data you are trying to pull back (see Jeff Atwood's post).

Mark Streatfield
+1  A: 

Well the best approach to break down your MySQL query is to run the EXPLAIN command as well as looking at the MySQL documentation for Optimization with the EXPLAIN command.

MySQL provides some great free GUI tools as well, the MySQL Query Browser is what you need to use.

When running the EXPLAIN command this will break down how MySQL interprets your query and displays the complexity. It might take some time to decode the output but thats another question in itself.

As for a good book I would recommend: High Performance MySQL: Optimization, Backups, Replication, and More

Phill Pafford
+1  A: 

I don't work in mySQL but I do frequently write extremely complex SQL and here's how I approach it.

First, there is no substitute whatsoever for thoroughly understanding your database structure.

Next I try to break up the task into chunks.

For instance, suppose I'm writing a report concerning the details of a meeting (the company I work for does meeting planning). I will need to know the meeting name and sales rep, the meeting venue and dates, the people who attened and the speaker information.

First I determine which of the tables will have the information for each field in the report. Now I know what I will have to join together, but not exactly how as yet.

So first I write a query to get the meetings I want. This is the basis for all the rest of the report, so I start there. Now the rest of the report can probably be done in any order although I prefer to work through the parts that should have one-one relationshisps first, so next I'll add the joins and the fields that will get me all the sales rep associated information.

Suppose I only want one rep per meeting (if there are multiple reps, I only want the main one) so I check to make sure that I'm still returning the same number of records as when I just had meeting information. If not I look at my joins and decide which one is giving me more records than I need. In this case it might be the address table as we are storing multiple address for the rep. I then adjust the query to get only one. This may be easy (you may have a field that indicates the specific unique address you want and so only need to add a where condition) or you may need to do some grouping and aggregate functions to get what you want.

Then I go on to the next chunk (working first through all the chunks that should have a 1-1 relationshisp to the central data in this case the meeting). Runthe query nd check the data after each addition.

Finally I move to those records which might have a one-many relationship and add them. Again I run the query and check the data. For instance, I might check the raw data for a particular meeting and make sure what my query is returning is exactly what I expect to see.

Suppose in one of these additions of a join I find the number of distinct meetings has dropped. Oops, then there is no data in one of the tables I just added and I need to change that to a left join.

Another time I may find too many records returned. Then I look to see if my where clause needs to have more filtering info or if I need to use an aggreagte function to get the data I need. Sometimes I will add other fields to the report temporarily to see if I can see what is causing the duplicated data. This helps me know what needs to be adjusted.

The real key is to work slowly, understand your data model and check the data after every new chunk is added to make sure it is returning the results the way you think they should be.

Sometimes, If I'm returning a lot of data, I will temporarily put an additonal where clause on the query to restrict to a few items I can easily check. I also strongly suggest the use of order by because it will help you see if you are getting duplicated records.

HLGEM
Thanks for sharing your approach. This is great. :)
Das123