Thanks all for your help.
I have made 3 edits to mistakes in the original post.
1) The WHEREs should have been ANDs.
2) k4 should have been MONEY not VARCHAR. Also, k1 is of length 3.
3) The k2 should not be in the WHERE clause. As doofledorfer correctly points out, it makes no sense to have any other WHERE statements other than the full primary key.
Here are the answers to your questions:
Why have you clustered on the primary key?
I was under the impression that the PK was set as a clustered index by default. I did not change it.
Which columns can be NULL?
None.
What are the VARCHAR lengths?
I made a mistake with the column types. The only remaining VARCHAR is of length 3.
What does the query plan give you now?
Posted in the next post.
Help me understand more about the table.
if your PK is k1,k2, you shouldn't have to select by
any other column to get a completely unique record.
This was a mistake. The k2 part of the PK is not in the WHERE clause.
Knowing why you need around a million records
returned might help me provide a better solution.
The database contains daily records (the d1 TEXT column) or data. People need access to large amounts of this data to run their own reports.
They need to filter it by a number of values and have it delivered sorted by time.
It looks like you only want the earliest "g" records? Maybe only the most recent "g" records?
Yes, the latest. But I a certain number of them. I don't know the start date beforehand.
Do you have foreign keys on k3, k4?
No. This is the only table int the DB.
Comments:
Even if the clustered index is proper, the more selective field should come first.
The more selective index is not used in the WHERE clause (post edit!). So I take it it should not come first in that case?
You may want to Move data over a certain age to a history table
Currently all the data is used so pruning is not an option.
You may want to defrag the index
Currently I have none. Will look into it if this thread proves fruitful.
Add a single index with columns k1-k6 in it; that should be the best.
Can anyone else comment on this suggestion? Liggett78 cammented that this will double the size of the DB without helping much
because of the date-column sort. Note that the DATE column is not in the WHERE clause, it is only used for ordering the data.
Try turning k1, k2 into ints and
making them foreign keys, it'll use a
lot less storage for one, I'd have
thought and I think it should be
quicker (though I may be wrong there,
I guess SQL Server caches these
values).
k2 is a bigint (mistake in the orig post).
So changing k1 to an int (from a VARCHAR(3)) is an option. Do we really think this is going to make much difference.
And do people really think that splitting the table into k1,k2,d1 and k1,k2,k3,k4,k5,k7 and using foreign keys would improve things?
One good tip to improve query speeds
is to put in a sub-query that cuts
down your recordset size to a more
manageable one. There is likely to be
some set of data that immediately cuts
the recordset down from, say 10
million rows, to 10,000.
e.g. SELECT TOP(g) d1 FROM (SELECT *
FROM table WHERE k1=a
WITH(NOLOCK)) WHERE AND k3 = c AND k4
= d AND k5 = e AND k6 = f ORDER BY k7
Very interesting. Would this really help? It seems like SQL Server would be very stupid if it did not cut down the data in a similar manner itself.
Perhaps it
is the time taken by your UI or
whether to display the data, perhaps
it is the time taken by the Network ?
There is no UI. There certainly are network issues moving the data but I am only concerned with the time taken for the query to start returning results (I'm using an ADO.NET data reader) at the moment - one thing at a time :)
.. [to] see the most gains ... partition the table
Will a clustered index not have the same effect?
Leave your primary key alone, but
create a clustered index on your date
column, since this is what you use in
ORDER BY. That way the database engine
would begin to scan the clustered key,
compare columns with your supplied
values and output rows that satisfy
the conditions.
Sounds like a sound plan! Any other backers?
To summarize the suggestions:
1) Create separate indexes on all keys: most people vote no on this?
2) Create separate indexes on the keys with most distinct values.
3) Create a multiple column index on some of the columns, with the columns with the most distinct values first.
4) Throw RAM at it.