A: 

I would consider to put a clustered index on the large tables. The clustered index defines the physical order of how the records are stored. The consequence of this, is that the rows in the table can be stored more efficiently, and reduce fragmentation. I'm sure there will at least be one column in the table which could be a candidate to put a clustered index on. (And if not, you could create a new column, which contains the date and time when the record has been created, and you put a clustered index on that column. I think that this is still better then no CI at all).

Edit: if the large tables are indeed log-tables, that aren't read frequently, then they can be left as a heap.

Frederik Gheysels
All the tables do have a column(s) to index, and in fact have indexes but just not clustered ones. I don't like date indexes, too easy to generate duplicates resulting in insert failures (INSERT from a select).
KM
That's true about the duplicates, but, a CI does not have to be a unique index (although it is preferred to be a unique one).
Frederik Gheysels
yea, duh, I was thinking unique
KM
A: 

It depends on how the tables are used. Normally I want a clustered index on a table with millions of records, but you also need to consider how the table is used. Adding an index will slow down inserts because it has to lookup the proper page for each new record (and possibly insert a new page) rather than just append it. If these title are primarily "dumps" for data and are rarely checked (like emergency logging, for example), then leaving them alone might be better.

As always, you should profile to find out what works best for your application or system.

Joel Coehoorn
see edit.......
KM
+4  A: 

Sounds like the database was created by someone who knew what he was doing. Log tables and small code tables are exactly where heaps make sense.

If there are no current problems with the database, I'd leave it as it is!

Andomar
easy answer if they were all log and codes tables, but they are not. Also, I'm researching a slow point in the application, and I will fix one of the large heap tables. I'm just wondering if I should change a few other tables in the same manner.
KM
You asked "Would you create clustered indexes on them selectively, across the board, or not at all". I'd say "not at all", unless there was a "slow point" which is demonstrably solved by adding the clustered index :)
Andomar
I'm with Andomar - sounds like the designer knew what he was doing, and so I'd say "no". That a slow point developed later after the size of the data increased is no discredit to the designer, and adding an index to address that particular slow down is perfectly fine - just don't overdo it. Unused indexes can have the opposite effect of their intention and _kill_ your performance. You should PROFILE to find out where, exactly, your issues are and what will work best to correct them.
Joel Coehoorn
I like your reasoning about only fixing the ones with known problems. However the "Sounds like the database was created by someone who knew what he was doing" ha ha. if you only took one look at the code! I could go on and about the cursor loops in triggers (easily rewritten to single statements), few if any actually defined FKs, almost zero indexes except PKs, etc. but I'll probably start crying ;-(
KM
what is the harm adding a clustered index to a "codes" table? they have very few inserts, but many many reads? If I insert some values when the table is created and then a year later insert a few more, couldn't this create fragmented heap that is going to be read many times?
KM
A heap performs better in every scenario for tables with less than about 250 rows. But the difference is so small you would need a lab setup to measure it :)
Andomar
+1  A: 

for large table a Clustered index is always a good idea. even for insert only table. of course your clustering key should be an ever increasing value.

Mladen Prajdic
+2  A: 

Add a clustered index always. Without a clustered index, you can not quickly compact or defrag the table. Without it, you can't.

Simplistic, but I bet some of the performance issues could be traced to badly organised data.

gbn
A: 

One problem with a clustered index on large tables is that the memory required to store the index is equal to the size of the table. There is no separate index. A non clustered index stores only the data of the index and then the table's primary key or a refid. So a normal index can much more likely fit in RAM. If you are doing searches using the clustered index and the table is large then you could easily be slowing things down. If your clustered index is part of the date and your searches are all for recent dates, then maybe the clustered index won't hurt searching performance since you never access all of it.

I disagree with posters claiming the clustered index will reduce data fragmentation. It increases the data fragmentation. On a normal table, only deleting causes fragmentation. As you add rows to a clustered table, or change the a field of the clustered index, SQL has to physically reorder table. This means breaking and adding data pages which increases fragmentation. Thats why everyone recommends being careful to pick a field for clustering that a) doesnt change often if ever, and b) always increments.

I find that a clustered index is useful on large tables when your queries need to return multiple "related" rows often. You can use the cluster so the related rows are stored consecutively and easier for SQL to retrieve. I wouldn't necessarily cluster a large table so that I would have a new index for searching.

The one advantage that clustering does have, like a covering index, is that the index contains the data that query is trying to return. There is no extra step from the index to the table to get the data.

In the end, you have to get the profiler out, and run some tests.

Am I getting this right or missing something?

johnnycrash