views:

184

answers:

4

I have a sql statement where I'm joining about 4 tables, each with 200K rows. The query runs, but keeps freezing. When I do a join on 3 tables instead, it returns the rows (takes about 10secs). Any suggestion why? suggestions to speed up?

Thanks!

Code

SELECT *
FROM equipment, tiremap, workreference, tirework
WHERE equipment.tiremap = tiremap.`TireID` AND 
      tiremap.`WorkMap` = workreference.`aMap` AND
      workreference.`bMap` = tirework.workmap
LIMIT 5

p.s

and if it helps any, I'm using sql alchemy to generate this code, the sqlalchemy code for this is

query = session.query(equipment, tiremap, workreference, tirework)
query = query.filter(equipment.c.tiremap == tiremap.c.TireID)
query = query.filter(tiremap.c.WorkMap==workreference.c.aMap)
query = query.filter(workreference.c.bMap == tirework.c.workmap)
query = query.limit(5)
query.all()
A: 

It could be that the 4th table you are joining to is much larger than the others. It could also be that the column you are joining on doesn't have an index on it.

TskTsk
+4  A: 

Make sure you have indexes on:

  • equipment (tiremap)
  • tiremap (TireID)
  • tiremap (WorkMap)
  • workreference (aMap)
  • workreference (bMap)
  • tirework (workmap)

Edit: I guess I should provide some context to that for completeness.

The SQL optimizer looks at a statement, parses it and then determines an execution plan for it based on the query, the tables referenced and available indexes. If you do SELECT * FROM tab1 then it'll do a full table scan of tab1 because there's no other way to execute that.

If you do SELECT * FROM person WHERE lastname LIKE 'V%' and you have a million records, it will be slow to interrogate every row but if lastname is indexed it's much more efficient.

With a query like yours one of those tables will be the driving table that regardless of indexes may simply be done as a full table scan. There's nothing wrong with this. One table has to drive the query. If there is a WHERE clause (for something other than join conditions) this may change but otherwise it's generally true.

From that driving table, MySQL will then start appending joins to the execution plan. These joins will require indexes on the other side to make this work efficiently.

So with three tables you may have one table that's not indexed but it doesn't matter because it drives the query. With the fourth table, there may be two unindexed tables and that's now a problem because for each row in one MySQL will have to do a full table scan of the other.

So basically you create an index on every foreign key and join column so MySQL can use what's available to make the best execution plan for the query you give it.

Lastly, most tools will tell you about the database schema. PHPMyAdmin is a popular one for hosted databases. Personally I actually like a desktop app for this kind of thing. Navicat Lite is a decent free tool for this.

cletus
awesome man, I'll check that. 'preesh.
colorfulgrayscale
also, is there a way to know if the tables are indexed or not?ok, nvm, i just googled it. thanks.
colorfulgrayscale
@colorfulgrayscale added some context to my statement.
cletus
@cletus, thanks again. figured it out :)
colorfulgrayscale
A: 

Most SQL database have some variation of "EXPLAIN PLAN" or "EXPLAIN" that you can use to see how it's parsing the query. Look for full table scans as a place where you need indexes.

Paul Tomblin
A: 

You are doing a natural join of 4 tables. Also, in your "WHERE" statement, there are no special conditions.

The database engine will do the following thing :

It will first do a recursive product of all the data in each table.

Consider the following rows in tables A, B and C:

A = rowA1
    rowA2
    rowA3;
B = rowB1
    rowB2
    rowB3;
C = rowC1
    rowC2
    rowC3;

Basically, if you do a natural join of those 3 tables, the engine will have in memory:

rowA1 - rowB1 - rowC1
rowA1 - rowB1 - rowC2
rowA1 - rowB1 - rowC3
rowA1 - rowB2 - rowC1
rowA1 - rowB2 - rowC2
rowA1 - rowB2 - rowC3
rowA1 - rowB3 - rowC1
rowA1 - rowB3 - rowC2
rowA1 - rowB3 - rowC3
...
...
...
rowA3 - rowB3 - rowC1
rowA3 - rowB3 - rowC2
rowA3 - rowB3 - rowC3

In total, 27 rows are put in memory. However, we only want 3 rows :

rowA1 - rowB1 - rowC1
rowA2 - rowB2 - rowC2
rowA3 - rowB3 - rowC3

If your database engine doesn't do optimization by itself, a natural join of 3 table is very expensive. For 4 tables, it is unconceivable, even for a limited number of rows.

Now, how can we get something better ?

First, by looking at the code, we know that we only need 5 values. Also, in database optimization, it is said that you should make the SELECT the earliest possible.

Here is some untested code that should help you. You may have to modify it, depending on what DB engine you are using :

SELECT *
FROM (SELECT * FROM equipment LIMIT 5) e, tiremap, workreference, tirework
WHERE e.tiremap = tiremap.TireID AND
      tiremap.WorkMap = workreference.`aMap` AND
      workreference.`bMap` = tirework.workmap

Just by doing this, it should feel like we had only 3 tables, and not 4. Still, this is not really what you want. If one row of "equipment" is not referenced in the other tables, you will get less than 5 rows at the end. However, this an example to show you that we might not really need all the rows from all the tables.

Now, what I think you want could be this :

SELECT * FROM equipment 
INNER JOIN tiremap ON equipment.tiremap = tiremap.TireID
INNER JOIN workreference ON tiremap.WorkMap = workreference.aMap
INNER JOIN tirework ON workreference.bMap = tirework.workmap
LIMIT 5

You might have a problem here : if your engine is not that good (mySQL, sorry), it can take a long time.

If you really want to do the optimization yourself :

SELECT * FROM tirework, 
   (SELECT * FROM workreference, 
       (SELECT * FROM tiremap,
           (SELECT * FROM equipment) e
        WHERE e.tiremap = tiremap.TireID) t
    WHERE t.WorkMap = workreference.aMap) w
WHERE w.bMap = tirework.workmap
LIMIT 5

And voilà ! Even if your engine optimizer is nonexistent, that query shouldn't take too long. Instead of making a big product of everything, your engine will do one product at a time and get the bad rows out before joining it with a new table.

Try it.

Pascal Potvin
zomg, thanks a lot. This is exactly what I was looking for. peace.
colorfulgrayscale