views:

1146

answers:

4

While studying for the 70-433 exam I noticed you can create a covering index in one of the following two ways.

CREATE INDEX idx1 ON MyTable (Col1, Col2, Col3)

-- OR --

CREATE INDEX idx1 ON MyTable (Col1) INCLUDE (Col2, Col3)

The INCLUDE clause is new to me. Why would you use it and what guidelines would you suggest in determining whether to create a covering index with or without the INCLUDE clause?

+5  A: 

If not in the WHERE or JOIN or ORDER BY, but only in the column list in the SELECT clause.

The INCLUDE clause adds the data at the lowest/leaf level, rather than in the index tree. This makes the index smaller because it's not part of the tree

Another MSDN article with a worked example

gbn
+1  A: 

Basic index columns are sorted, but included columns are not sorted. This saves resources in maintaining the index, while still making it possible to provide the data in the included columns to cover a query. So, if you want to cover queries, you can put the search criteria to locate rows into the sorted columns of the index, but then "include" additional, unsorted columns with non-search data. It definitely helps with reducing the amount of sorting and fragmentation in index maintenance.

onupdatecascade
+8  A: 

You would use the INCLUDE to add one or more columns to the leaf level of a non-clustered index, if by doing so, you can "cover" your queries.

Imagine you need to query for an employee's ID, department ID, and lastname.

SELECT EmployeeID, DepartmentID, LastName
FROM Employee
WHERE DepartmentID = 5

If you happen to have a non-clustered index on (EmployeeID, DepartmentID), once you find the employees for a given department, you now have to do "bookmark lookup" to get the actual full employee record, just to get the lastname column. That can get pretty expensive in terms of performance, if you find a lot of employees.

If you had included that lastname in your index:

CREATE NONCLUSTERED INDEX NC_EmpDep 
  ON Employee(EmployeeID, DepartmentID)
  INCLUDE (Lastname)

then all the information you need is available in the leaf level of the non-clustered index. Just by seeking in the non-clustered index and finding your employees for a given department, you have all the necessary information, and the bookmark lookup for each employee found in the index is no longer necessary --> you save a lot of time.

Obviously, you cannot include every column in every non-clustered index - but if you do have queries which are missing just one or two columns to be "covered" (and that get used a lot), it can be very helpful to INCLUDE those into a suitable non-clustered index.

Marc

marc_s
+1  A: 

The reasons why (including the data in the leaf level of the index) have been nicely explained. The reason that you give two shakes about this, is that when you run your query, if you don't have the additional columns included (new feature in SQL 2005) the SQL Server has to go to the clustered index to get the additional columns which takes more time, and adds more load to the SQL Server service, the disks, and the memory (buffer cache to be specific) as new data pages are loaded into memory, potentially pushing other more often needed data out of the buffer cache.

mrdenny