views:

17

answers:

3

Here I am studying nonclustered indexes on SQL Server Management Studio.

I've created a table with more than 1 million records. This table has a primary key.

CREATE TABLE [dbo].[Customers](
    [CustomerId] [int] IDENTITY(1,1) NOT NULL,
    [CustomerName] [varchar](100) NOT NULL,
    [Deleted] [bit] NOT NULL,
    [Active] [bit] NOT NULL,
 CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED 
(
    [CustomerId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF,     ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

This is the query I'll be using to see what execution plan is showing:

SELECT CustomerName FROM Customers

Well, executing this command with no additional non-clustered index, it leads the execution plan to show me:

I/O cost = 3.45646
Operator cost = 4.57715

Now I'm trying to see if it's possible to improve performance, so I've created a non-clustered index for this table:

1) First non-clustered index

CREATE NONCLUSTERED INDEX [IX_CustomerID_CustomerName] ON [dbo].[Customers] 
(
    [CustomerId] ASC,
    [CustomerName] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON,
ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

Executing again the select against Customers table, the execution plan shows me:

I/O cost = 2.79942
Operator cost = 3.92001

It seems better. Now I've deleted this just created non-clustered index, in order to create a new one:

2) First non-clustered index

CREATE NONCLUSTERED INDEX [IX_CustomerIDIncludeCustomerName] ON [dbo].[Customers] 
(
    [CustomerId] ASC
)
INCLUDE ( [CustomerName]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF,  
SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,   
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

With this new non-clustered index, I've executed the select statement again and the execution plan shows me the same result:

I/O cost = 2.79942
Operator cost = 3.92001

So, which non-clustered index should I use? Why the costs are the same on execution plan for I/O and Operator? Am I doing something wrong or this is expected?

thank you

+2  A: 

This is because of "CustomerName" being INCLUDE-ed in the second index (see this about INLCUDEd columns).

Basically, BOTH indexes work the same exact way for you - they are covered indexes with the first index column NOT matching the WHERE clause.

Meaning that the query in both cases will be scanning the index but NOT touching the table.

The index that I'd expect ti be better performing for that specific query would be an index on CustomerName alone.

DVK
Indeed, the I/O cost and Operator cost increase a little when I've created an index on CustomerName alone.
Junior Mayhé
@Junior Mayhé: DVK is 100% correct but you haven't created an index on CustomerName *alone* If you do, I'd expect a considerable drop in cost. eg CREATE INDEX IX_Custname ON dbo.Customers (CustomerName)
gbn
@DVK: nice comment! I think this helped me to create just needed nonclustered columns for my database. On the link Microsoft is telling to avoid adding too many unnecessary columns
Junior Mayhé
+1  A: 

You're not going to notice much difference with or without indexes until you make use of the index by filtering the results using WHERE, ordering the results with ORDER, or joining the results to another table on the indexed column.

Try doing a query like this without an index:

SELECT *
FROM Customers
WHERE CustomerName = 'Marcus Adams'

Then add an index on the CustomerName column and try again.

You'll also need enough rows in the table that the database system will actually use the index, and enough that you'll notice the difference between scanning the rows and using the index.

Marcus Adams
SELECT * will typically force a clustered scan. If not, it will do an expensive bookmark/key lookup
gbn
@gbn, the `SELECT *` was for brevity, but thanks for the info.
Marcus Adams
yes I also did something like SELECT CustomerName FROM CustomersWHERE CustomerName like '%Marcus%'. It seems to reduce the cost when setting nonclustered index on CustomerName
Junior Mayhé
A: 

Neither of your two nonclustered indices makes a lot of sense, really.

The point is this: the column(s) of the clustered index - in your case CustomerId - is already included in every single entry of every single non-clustered index you have. That clustering column after all is what is used for the actual data lookup, if an entry is found. So adding that to a non-clustered index typically is superfluous and just a waste of space.

The question is more: how do you select those rows that you want to have displayed? What columns will show up in the WHERE clause?

If you find a pattern here (e.g. you always select by e.g. City), then the non-clustered index to suit your needs would be

CREATE NONCLUSTERED INDEX [IX_Customer_City] ON [dbo].[Customers] 
(
    [City] ASC
)
INCLUDE ( [CustomerName]) 

That way, you give SQL Server a method to easily find rows that match a given city, and including the column that you want to have returned (CustomerName) allows SQL Server to get the necessary information directly from the index page (making it a so-called covering index - it covers your query, e.g. returns all the information needed) - you won't need to do a "bookmark lookup", e.g. fetch the entire Customer data row from the actual data pages (finding it via the CustomerId which is in the non-clustered index, too, since it's the clustering key).

marc_s
I got your point. I think I missed the Where clause on my sample. Here I was making tests using Where clause to compare those costs. The idea is to add more columns to Customers table, some of them will be used for where clause and also for inner/outer joins.
Junior Mayhé

related questions