views:

293

answers:

6

Good day,

I have about 4GB of data, separated in about 10 different tables. Each table has a lot of columns, and each column can be a search criteria in a query. I'm not a DBA at all, and I don't know much about indexes, but I want to speed up the search as much as possible. The important point is, there won't be any update, insert or delete at any moment (the tables are populated once every 4 months). Is it appropriate to create an index on each and every column? Remember: no insert, update or delete, only selects! Also, if I can make all of these columns integer instead of varchar, would i make a difference in speed?

Thank you very much!

+3  A: 

Have you looked at running the Index Tuning Wizard? Will give you suggestions of indexes based on a workload.

KiwiBastard
@KiwiBastard (that could be anyone in NZ, hello from Oz :-), good answer, +1. Does the wizard do stats on the fly (to keep optimizer up to date) or does it just suggest new DDL commands to apply to the tables? DB2 has runstats which changes the plan paths based on data in tables.
paxdiablo
+1  A: 

Two pieces of missing information: how many distinct values are in each column, and which DBMS you're using. If you're using Oracle and have less than a few thousand distinct values per column, you can create bitmap indexes. These are very space- and execution-efficient for exact matches.

Otherwise, it's a tradeoff: each index will add roughly the same amount of space as a one-column name containing the same data, so you'll essentially double (probably 2.5x) your space requirements. So maybe 10G, which isn't a whole lot of data.

Then there's the question of whether your DBMS will efficiently merge multiple index-based selects. It's quite possible that it won't, unless you do self-joins for every column that you're selecting against.

Best answer: try it on a smaller dataset (so that you're not spending all your time building the indexes) and see how it works.

kdgregory
FYI:Tagged as MSSQL2000
KiwiBastard
+6  A: 

Answer: No. Indexing every column separately is not good design. Indexes need to comprise multiple columns in many cases, and there are different types of indexes for different requirements.

The tuning wizard mentioned in other answers is a good first cut (esp. for a learner).

Don't try to guess your way through it, or hope you understand complex analyses - get advice specific to your situation. We seem to have several threads going here that are quite active for specific situations and query optimization.

le dorfier
Also, don't forget that if there are many columns, it will take the optimizer longer to work out which indexes will help and which won't. Many, possibly most, columns will not need indexes; only those that are actively used in filter conditions benefit you.
Jonathan Leffler
@Jon, that's why real databases (like DB2 :-) have runstats and such so they can keep the optimizer abreast of data distribution within tables. Optimizer can then easily select the best index no matter how many there are.
paxdiablo
@doofle, question states that every column is subject to search - hence for max speed, they should all be indexed, as well as possibly indexes on multiple-column groups.
paxdiablo
@Pax,he's asking about a single-field index on each column. And if a column is the first in a composite index, it doesn't need another index of its own.Furthermore, for instance, boolean field indexes are ignored, so a blanket rule is too naive for those cases.
le dorfier
@doofle, read the question - there are NO bitfields, EVERY column is subject to a search.
paxdiablo
@Pax, I didn't say "Bit" I said "boolean". "Y"/"N", "1"/"0", apparently it's all in varchars. Please try to focus on the issue at hand. You try to explain cardinality if you can do better. But the point is valid however you put it. Blindly indexing is bad policy.
le dorfier
Blindly indexing is bad policy, but all the information was provided: "each column can be a search criteria in a query". For maximum speed IN THIS CASE (not generally), the more indexes, the better, including multi-column keys if those types of queries are also possible.
paxdiablo
I agree with your comments, in general (I haven't downvoted BTW) - it's just in this case, there's enough info to justify what the questioner wanted to do.
paxdiablo
DocId is probably unique so an index is no better than a full table scan (if it is unique). Similarly, DocType/DocId as a key offers no advantage over just DocType. So, if my assumptions are correct, probably just DocType. More to come...
paxdiablo
A DocId index would make sense if the record size was much larger than the key, but only in terms of 'searchitems'-per-diskpage - you can load more keys than records with a single I/O.
paxdiablo
Objective dialog question. Two of the fields are "Document Type" and "Document ID". There are only two document types. Do you put one index on each, or one compound index on both, or all three?--------------There is a CHAR(1) "State" field, with "I"nactive and "A"ctive. Index?
le dorfier
That's as far as I'm interested in taking my point. Some fields by their nature don't require indexes. It's more complicated than "index everything", which is what I heard him asking. Even then, it's not *that* important a point.
le dorfier
Yeah, I don't want to hit any possible hard limits SO may have imposed on comments per answer :-). Cheers.
paxdiablo
You could always make that an SO question. "What are the hard limits?"-- Cheers.
le dorfier
+3  A: 

Absolutely not.

You have to understand how indexes work. If you have a table of say, 1000 records, but it's a BIT and there can be one of two values, if you index on that column and that column only, it will be worthless, because it will not be selective enough. When you index on a column, be very cognizant of what types of selects are going to be done on the table. When you create an index on a column, will that index be selective enough for the optimizer to use effectively?

To that point, you may very well find that a few carefully selected composite indexes will vastly outperform the solution of many single indexes on each column. The golden rule: how the database is queried will determine how you should make your indexes.

Dave Markle
@Dave, question was for varchars and every column was searchable so, while your response is good for a general question on indexes, it's not really applicable to this question. Your golden rule is right but you alrerady have that required info to make a decision.
paxdiablo
Just because a column has VARCHARs in it does NOT mean that the index is selective! The BIT example was used just to illustrate something that obviously couldn't be selective. The same would be true if your VARCHAR column only has 2 or 3 values per 1000 rows...
Dave Markle
A: 

Is creating an index on every column worth the maintenance headache and the additional overhead of disk space? What about your table population every 4 months? The time it takes for this will be drastically affected.
I also feel there is something very wrong with creating an index on every column just because you can. It's like using the WITH (NOLOCK) hint on every query without understanding why! its bound to come back and bite you some day.

The index tuning wizard is a good way to get started with indexes, although be cautious of applying all of its suggestions blindly.

A way to get faster performance is to set the database to read-only mode. This will mean no locks need to be acquired (because updates/inserts cannot occur) and give you a big performance boost! (yes, locks are acquired in a db, even when you perform a SELECT) The syntax for making a db read-only is:

EXEC sp_dboption 'dbname', 'read only', 'TRUE'
Nick Kavadias
The other option is to run your queries in a readuncommitted transaction scope. Essentially the same things as NOLOCK
Frustrating Developments
essentially the same for this scenario anyway
Frustrating Developments
A: 

If you are selecting a set of columns from the table greater than those covered by the columns in the selected indexes, then you will inevitably incur a bookmark lookup in the query plan, which is where the query processor has to retrieve the non-covered columns from the clustered index using the reference ID from leaf rows in the associated non-clustered index.

In my experience, bookmark lookups can really kill query performance, due to the volume of extra reads required and the fact that each row in the clustered index has to be resolved individually. This is why I try to make NC indexes covering wherever possible, which is easier on smaller tables where the required query plans are well-known, but if you have large tables with lots of columns with arbitrary queries expected then this probably won't be feasible.

This means you only get bang for your buck with an NC index of any kind, if the index is covering, or selects a small-enough data set that the cost of a bookmark lookup is mitigated - indeed, you may find that the query optimizer won't even look at your indexes if the cost is prohibitive compared to a clustered index scan, where all the columns are already available.

So there is no point in creating an index unless you know that index will optimize the result of a given query. The value of an index is therefore proportional to the percentage of queries that it can optimize for a given table, and this can only be determined by analyzing the queries that are being executed, which is exactly what the Index Tuning Wizard does for you.

so in summary:

1) Don't index every column. This is classic premature optimization. You cannot optimize a large table with indexes for all possible query plans in advance.

2) Don't index any column, until you have captured and run a base workload through the Index Tuning Wizard. This workload needs to be representative of the usage patterns of your application, so that the wizard can determine what indexes would actually help the performance of your queries.