views:

224

answers:

4

I like to think I know enought theory, but I have little experience optimizing DB in real world. I would like to know points of view, thoughts or experiences.

Let's imagine a scenario like:

Table A Key: c1, c2, c3, c4 Index: c7, c3, c2

Table B Key: c1, c2, c3, c4 Index: c1, c5

All are non-clustered. The tables have 40+ fields. They are feeded daily by night and have some updates during the day.

Table A, if more queries benefit from Key than Index, could the index impact negatively? Because insert/delete has to update 2 indexes instead of 1.

Table B, has an extra field at the index, not present in key.

Could a query using c1, c5

Benefit from this Key?: Key: c1, c2, c3, c4, c5

So that Index could be droped.

What impact does the order of the fields has? Key: c1, c2, c3 Key: c3, c1, c2

A typical scenario for me is process_date, client_number, operation. And it feeds with a bunch of data each day (process_date).

+1  A: 

If more queries benefit from Key than Index, could the index impact negatively?

Yes.

But...

You have to actually measure your actual workload to see if this is true. It's hard to predict from purely theoretical grounds. Possible, but hard.

Could a query using c1, c5 Benefit from this Key?: Key: c1, c2, c3, c4, c5

Rarely. The query algorithms rarely make use of part of a key or index. It's usually all or nothing. If the whole key (or index) can't be used, then none of it is used.

It's easy to get a query execution plan and get a definite answer to this question.

Learn to to get the execution plans and actually get them.

What impact does the order of the fields has? Key: c1, c2, c3 Key: c3, c1, c2

Almost no impact at all. In some databases, it can change how the rows appear if you omit an ORDER BY clause. In other databases, it has no impact because the physical rows and the order of the key index are separate.

You can easily drop and recreate the index, check execution plans, and see what -- if any -- impact this has.

The only way to be sure is to get the execution plans and look at them.

S.Lott
What is your basis for saying that an index is all or nothing? A query on c1, c2 could certainly use an index on c1, c2, c3. Even an index on c1, c3, c2 could be used if it's the only covering index available. Can you clarify?
Tom H.
I would love to get a query execution plan, but my scenario is up and running, so queries are already sepcified, can't change them in short term.
mencargo
You still need to get query execution plans, even if the SQL is already written. You are talking about index creation; the only way to be sure the index is -- or is not -- helping is to get execution plans from the actual queries in the actual application.
S.Lott
@Tom H.: In looking at query execution plans in Oracle (and DB2) I never saw indexes used unless the entire index was relevant. Perhaps my experience is out-of-date, but I've never seen an index used unless the entire index was part of the query.
S.Lott
A: 

If you're trying to optimize your indexing strategy, you can run the Database Tuning Advisor or query sys.dm_db_index_usage_stats to find out how your indexes are being used. (These stats are cleared when the SQL Server process is restarted).

In answer to your first question, indexes always negatively impact write performance, so it can be useful to weed out unused ones. But make sure they are really unused first.

In answer to your second question, it's unlikely that removing the index will increase the query performance, and in fact may decrease it, since the index is covering the query with the minimal number of columns. If by "using c1, c5" means that those are the only two columns involved, then removing the index will likely reduce performance slightly. If the query is selecting more columns than are in the index, then the index has to join to the table anyway if it gets used, so it's unlikely to affect performance much if you drop it. The only way to be sure is to analyze the query cost for both scenarios.

womp
I think sys.dm_db_index_usage_stats is for Microsoft SQL Server only, I don't knnow if there's an equivalent for Sybase
mencargo
Yes, it is SQL Server specific. I thought for some reason he tagged it with sql-server but I guess not. Hopefully it helps anyway?
womp
I tagged it with "sql" not "sql-server", but it could help a lot if I find a Sybase equivalent, regards.
mencargo
+1  A: 

if more queries benefit from Key than Index, could the index impact negatively? Because insert/delete has to update 2 indexes instead of 1.

A non-clustered index has a negative impact on insert/update/delete performance. The negative impact is usually outweighed by the increased performance of selects.

Could a query using c1, c5 Benefit from this Key?: Key: c1, c2, c3, c4, c5

Yes, if only a few rows share the same c1, then the index will be very effective.

What impact does the order of the fields has? Key: c1, c2, c3 Key: c3, c1, c2

Order is important, both for filtering and ordering. An index on (c1,c2) can be used for where c1 = 1 and where c1 = 1 and c2 = 1, but not for where c2 = 1. Likewise, it helps with order by c1, but not with order by c2.

Andomar
Does a clustered index have no/less impact on insert/update/delete?Combining your other responses, if c1 is a date, and lots of rows have the same, and c5 is a costumer ID, the Key should be c5, c2, c3, c4, c1 for the query using c1, c5?
mencargo
@mencargo: A clustered index has less impact on insert/update/delete because it's stored as part of the table. The table has to be modified whether there's a clustered index or not. A key should uniquely identify a row. So for a customer table, CustomerID and only CustomerID should be the key.
Andomar
A: 

Indexes can certainly have a negative impact on table changes (either inserts, updates, or deletes). Unless you have a large number of unnecessary indexes or your system has an extremely high rate of data changes one or two extra indexes is unlikely to be a problem. Don't just put indexes on a table for no reason, but you usually don't have to go through extensive analysis to try to avoid an index.

The key on your table should be the minimum set of columns which uniquely identifies a row. It should not contain additional columns. For example, if I have a table of unique email addresses and my key is on email_address then I can only have a single row for "[email protected]". If I now add the description to the key because I use the description in a lot of my queries then suddenly I could have: "[email protected]", "Description #1" AND "[email protected]", "Description #2". Your data is no longer being properly constrained and you end up with a big mess on your hands.

The ordering of the columns in your key can have a very significant impact depending on the queries which you're using. For example, if you're looking for rows based on c1, c2 then a key of c1, c2, c3 would be very useful. If your key is c1, c3, c2 then it's not so useful.

Imagine it this way, I want you to look in the phone book and find everyone whose last name begins with "TO". That's a pretty simple request. Now, what if the names were ordered by the first letter of the last name followed by the third letter of the last name? It would be very difficult and time-consuming to find those names starting with "TO".

Tom H.