views:

290

answers:

8

ATM I am trying to learn how to efficiently use database inidices and would appreciate to get some expert input. I do not have any performance issues currently. I would just like to know, how you would handle your indices with this query:

SELECT B.event, 
       COALESCE(B.system, C.surname || ' ' || C.forename) AS name, 
       C.label, 
       B.timestamp
FROM A            
  INNER JOIN B ON A.event=B.event
  INNER JOIN C ON B.state=C.id
  LEFT OUTER JOIN D ON B.hur=D.id             
WHERE A.id IN(12,13,14,15,...) 
  ORDER BY B.event, B.timestamp

A.id, C.id and D.id are already primary keys

UPDATE normally i would put INDEX(A.event) and INDEX(B.event, B.timestamp). Is this correct? And what about B.event, B.state and B.hur ?

+2  A: 

You could add indexes to everything in the WHERE and ORDER BY clauses. Ie A.event, B.event and B.timestamp.

darasd
Don't add indexes blindly. See Lieven's answer. Add indexes blindly can hurt performance as each index needs to be maintained. In some cases like small tables this will hurt as the IO used to maintain could be used somewhere else. Sometimes a full table scan on a small table is better than indexes.
jim
I'm tempted to delete my answer; however, jim's comment is useful information, so should the answer be left?
darasd
A: 

SELECT B.event, B.system, COALESCE(C.surname) || ' ' || COALESCE(C.forename) AS name, C.label, B.timestamp FROM A
INNER JOIN B ON A.event=B.event INNER JOIN C ON B.state=C.id LEFT OUTER JOIN D ON B.hur=D.id
WHERE A.event = ANY(:visits) ORDER BY B.event, B.timestamp

Also the ORDER BY will slow things down badly. Make sure these are indexed: A.event B.event B.state C.id B.timestamp

JL
+4  A: 

I usually take these steps when trying to speed up my queries

  1. analyze the execution plan.
  2. try to create (covering) indexes to eliminate table scans.
  3. try to create (covering) indexes to eliminate index scans.

As for you query, you would not go wrong with creating indexes on

  • A.event
  • B.event
  • B.state
  • B.Hur
Lieven
Creating seperate indexes for B.event and B.state is not the same as creating an index on (B.event,B.state). It's important to differentiate this as it has a direct impact on performance.
Dems
@Dems: if that was left unclear in my answer, you are right offcourse.
Lieven
A: 

Add inidices on A.event and B.event. Other would be useless i think.

Pomyk
+1  A: 

I would add indexes to anything that is joined, in the where clause or in the order by clause.

In this case add indexes of the following (assuming ID fields are primary keys and already indexed) :

  1. A.event
  2. B.event
  3. B.state
  4. B.Hur
  5. B.event, B.timestamp (combined index of both fields)

The 5th one, being an index combination should speed the order by.

You need to temper the number of indexes, against any performance drop you have in inserting records into the table (the more indexes you add to the table, the slower inserts and updates will be, as indexes need to be updated).

Jayden
+2  A: 

Rewrite your query as this:

SELECT  B.event, 
        COALESCE(B.system, C.surname || ' ' || C.forename) AS name, 
        C.label, 
        B.timestamp
FROM    B            
INNER JOIN
        C
ON      C.id = B.state
LEFT OUTER JOIN
        D
ON      D.id = B.hur
WHERE   B.event IN
        (
        SELECT  event
        FROM    A
        WHERE   A.id IN (12, 13, 14, 15)
        )
ORDER BY
        B.event, B.timestamp

, and create a composite index on B (event, timestamp)

Quassnoi
+2  A: 

It is important to note that the order of the fields in the index is important.

An index is, in a sense, a search tree. If you index (B.event,B.state) then the tree will group together all records with the save "event" field, then order them by the "state" field.

If you were then to query that index for "b.state = x", the index would be of little use; The index is ordered by the "event" first.


In your example:
- filter A by it's "event" field
- join A.event to B.event
- join B.state to C.id
- Join B.hur = D.id
- Order by B.event, B.timestamp

It's important to note that the optimise will look at the statistics of your tables, and indexes, then may re-arrange the order of the joins. The result will be the same, but the order may give different performance, and the optimisers job is to try to find the best performance.

In your case I would expect B.event's order to be extremely important. Simply because that's the order of the resulting output, AND it's the field you filter by.

Next you join B.state to C.id. So having and index on C.id is good, it makes the join faster. But equally, having the B table data in a nice order may also make the join faster.

But, having an index on B.event and a separate index on B.state may yield little. The B.state index becomes next to pointless because we're using the B.event index. If you combine the two into one index (b.event then b.state) the execution plan may find a way to use the b.state part of the index.

Finally, if you put all the fields in the index, the index does get bigger, but the query may never actually need to look at the table. The information is in the index. The time taken to go from an index to the table to find the 'missing' fields is similar to that of a join. So for read performance, adding extra fields to the index can be of significant beenfit.

I'm wittering on now, but the summary is this:
- Usually, separate index on separate fields don't get used together
- For composite indexes, the order you specify the fields makes a difference
- Adding 'extra' fields to the index makes it bigger, but also can make queries faster
- The order of the execution plan matters more than the order of your query
- But the indexes you have can determine the order of the execution plan

This kind of work has no categorical answers. It is so dependent on your data it's closer to an art.

One option is to over load the tables with indexes, look at the resulting execution plan, and delete the indexes that are not necessary.

But even there a caveat applies. Because the execution plan is data (and table statistics) dependent, it is very important to have real world data in the tables. While the tables have 10' or 100s of rows, one execution plan may be fastest. But when you get millions of rows the execution plan can change, and so benefit from different indexes.

Dems
+1  A: 

Run explain analyze of the query, and read it - if it doesn't help - put the explain analyze output on explain.depesz.com and check what it "says".

depesz