views:

43

answers:

4

We have a table with, say, 5 indices (one clustered).

Question: will it somehow negatively affect optimizer performance - either speed or accuracy of index picks - if all 5 indices start with the same exact field? (all other things being equal).

It was suggested by someone at the company that it may have detrimental effect on performance, and thus one of the indices needs to have the first two fields switched.

I would prefer to avoid change if it is not necessary, since they didn't back up their assertion with any facts/reasoning, but the guy is senior and smart enough that I'm inclined to seriously consider what he suggests.

NOTE1: The basic answer "tailor the index to the where clauses and overall queries" is not going to help me - the index that would be changed is a covered index for the only query using it and thus the order of the fields in it would not affect the IO amount. I have asked a separate SO question just to confirm that assertion.

NOTE2: That field is a date when the records are inserted, and the table is pretty big, if this matters. It has data for ~100 days, about equal # of rows per date, and the first index is a clustered index starting with that date field.

A: 

I'm not familiar with the recent version of Sybase, but in general with all SQL servers, the main (and almost) only performance impact indexes have is with INSERT, DELETE and UPDATE queries. Basically each change to the database requires the data table per-se (or the clustered index) to be updated, as well as all the indexes.

With regards to SELECT queries, having "too many" indexes may have a minor performance impact for example by introducing competing hard disk pages for cache. But I doubt this would be a significant issue in most cases.

The fact that the first column in all these indexes is the date, and assuming a generally monotonic progression of the date value, is a positive thing (with regards to CRUD operations) for it will keep the need of splitting/balancing the index tables to a minimal. (since most inserts at at the end of the indexes).

Also this table appears to be small enough ("big" is a relative word ;-) ) that some experimentation with it to assert performance issues in a more systematic fashion could probably be done relatively safely and easily without interfering much with production. (Unless the 10k or so records are very wide or the query per seconds rate is high etc..)

mjv
I have a separate question that I soon plan to post specifically asking about impact of this situation on inserts
DVK
Not sure what's considered "big", it has ~50mm rows and ~100Gigs of data.
DVK
Well 50 million rows is definitively "respectable", and worthy of close attention to performance issues... I thought circa 10,000 (100 day * ~100 rows per date). I think that "the same" in your question expressed the idea that every day has +/- the same as the other days.
mjv
Eh. I was overly pessimistic (or optimistic?) on my original guesstimate of how may days worth we have.
DVK
A: 

I doubt it would have any major impact on SELECT performance.

BUT it probably means you could reorganise those indexes (based on a respresentative query workload) to better serve queries more efficiently.

Mitch Wheat
Assume that the other indexes are already as optimized as we were able to make them to the best of our ability, and all queries - other than the one for which the covered index was added - NEVER use the to-be-changed covered index already due to their column set.
DVK
@DVK: if the indexes are as optimised as you can make them, then by definition, there is no question!! ;) –
Mitch Wheat
+1  A: 

The optimizer has to think more about which if any of the indexes to use if there are five. That cost is usually not too bad, but it depends on the queries you're asking of it. In principle, once the query is optimized, the time taken to execute it should be about the same. If you are preparing SELECT statements for multiple uses, that won't matter much. If every query is prepared afresh and never reused, then the overhead may become a drag on the system performance - particularly if it turns out that it really doesn't matter which of the indexes is actually used for most queries (a moderately strong danger when five indexes all share the same leading columns).

There is also the maintenance cost when the data changes - updating five indexes takes noticably longer than just one index, plus you are using roughly five times as much disk storage for five indexes as for one.

Jonathan Leffler
100% agree that the time to execute (assuming optimizer does not screw up) is un-affected. The environment is 100% dynamic queries, no stored procs on that table.
DVK
What I'm interested in is just HOW much of a drag it will be, especially considering that the "competing" configuration has 4 indexes unchanged and only one with different starting column.
DVK
A: 

I do not wish to speak for your senior colleague but I believe you have misinterpreted what he said, or he has not expressed himself explicitly enough for you to understand.

One of the things that stand out about poorly designed, and therefore poorly performing tables are, they have many indices on them, and the leading columns of the indices are all the same. Every single time.

So it is pointless debating (the debate is too isolated) whether there is a server cost for indices which all have the same leading columns; the problem is the poorly designed table which exposes itself in myriad ways. That is a massive server cost on every access. I suspect that that is where your esteemed colleague was coming from.

A monotonic column for an index is very poor choice (understood, you need at least one) for an index. But when you use that monotonic column to force uniqueness in some other index, which would otherwise be irrelevant (due to low cardinality, such as SexCode), that is another red flag to me. You've merely forced an irrelevant index to be slightly relevant); the queries, except for the single covered query, perform poorly on anything beyond the simplest select via primary key.

There is no such thing as a "covered index", but I understand what you mean, you have added an index so that a certain query will execute as a covered query. Another flag.

I am with Mitch, but I am not sure you get his drift.

Last, responding to your question in isolation, having five indices with the leading columns all the same would not cause a "performance problem", beyond that which your already have due to the poor table design, but it will cause angst and unnecessary manual labour for the developers chasing down weird behaviour, such as "how come the optimiser used index_1 for my query but today it is using index_4?".

Your language consistently (and particularly in the comments) displays a manner of dealing with issues in isolation. The concept of a server and a database, is that it is a shared central resource, the very opposite of isolation. A problem that is "solved" in isolation will usually result in negative performance impact for everyone outside that isolated space.

If you really want the problem dealt with, fully, post the CREATE TABLE statement.

PerformanceDBA
I've just read your related question. This confirms that your indices I1 to I4 are incorrect, you've over-engineered them. The result is each of them are much, much wider than they need to be, to many columns in each. On a 100GB table that will cause a substantial performance issue, overall. Maintenance (nightly and weekly batch runs) as well. Post the CREATE TABLE and CREATE INDEX statements and we will fix it for you.
PerformanceDBA