views:

679

answers:

3

Hi,

I have some SQL similar to the following, which joins four tables and then orders the results by the "status" column of the first:

SELECT * 
 FROM a, b, c, d 
 WHERE b.aid=a.id AND c.id=a.cid AND a.did=d.id AND a.did='XXX'
 ORDER BY a.status

It works. However, it's slow. I've worked out this is because of the ORDER BY clause and the lack of any index on table "a".

All four tables have the PRIMARY KEYs set on the "id" column.

So, I know I need to add an index to table a which includes the "status" column but what else does it need to include? Should "bid", "cid" and "did" be in there too?

I've tried to ask this in a general SQL sense but, if it's important, the target is SQLite for use with Gears.

Thanks in advance,

Jake (noob)

A: 

I'd be curious as to how you worked out that the problem is 'the ORDER BY clause and the lack of any index on table "a".' I find this a little suspicious because there is an index on table a, on the primary key, you later say.

Looking at the nature of the query and what I can guess about the nature of the data, I would think that this query would generally produce relatively few results compared to the size of the tables it's using, and that thus the ORDER BY would be extremely cheap. Of course, this is just a guess.

Whether an index will even help at all is dependent on the data in the table. What indices your query optimizer will use when doing a query is dependent on a lot of different factors, one of the big ones being the expected number of results produced from a lookup.

One thing that would help a lot is if you would post the output of EXPLAINing your query.

Curt Sampson
+2  A: 

I would say it's slow because the engine is doing scans all over the place instead of seeks. Did you mean to do SELECT a.* instead? That would be faster as well, SELECT * here is equivalent to a.*, b.*, c.*, d.*.

You will probably get better results if you put a separate index on each of these columns:

  • a.did (so that a.did = 'XXX' is a seek instead of a scan, also helps a.did = d.id)
  • a.cid (for a.cid = c.id)
  • b.aid (for a.id = b.aid)

You could try adding Status to the first and second indexes with ASCENDING order, for additional performance - it doesn't hurt.

Sam
The SQL I posted was a simplified version of what;s in use, to save confusion.In reality I am selecting a couple of specific columns from each of the tables and not actually using "*"I'll try your suggestion of adding three indexes and see how I get on.
Jake Howlett
Brilliant! It worked. Went from 0.4s to 0.001s !! Can't say fairer than that!My error was that I was trying to get all the columns in one index rather than spread them out like you suggest Sam. What I really need to do is learn SQL from first principles... if only I had the time to read books...Thanks!
Jake Howlett
Yeah, handy hint to remember is that the optimizer will only use an index where there is a predicate for the first column in the index
Sam
Not true - Oracle will do a "skip scan" even if the first column of an index is not in the where clause but subsequent columns are.
Gandalf
A: 

have you tried joins?

select * from a inner join b on a.id = b.aid inner join c on a.cid = c.id inner join d on a.did=d.id where a.did='XXX' ORDER BY a.status

the correct use of joins (left, richt, inner, outer) depends on structure of tables

hope this helps

jan
I thought what I was doing was a "join" (of sorts). A natural join, no?I did try it with inner joins but it still displayed the same different in time take for an ordered vs. non-ordered query
Jake Howlett
Both queries are valid and equivalent, despite slight variations in their expression.
Sam
Using 'where' instead of 'join' is slightly faster because the 'join' expression is first converted into a 'where' expression. Read here: http://www.sqlite.org/optoverview.html: "The ON and USING clauses of a join are converted into additional terms of the WHERE clause"
tuinstoel