I have a table which may have anywhere from 10,000 to 10 million rows during the life of the application. This table contains NACSZ info, among other things, and I need to write a query that checks for matching NACSZ like so:
select
*
from
Profiles
where
FirstName = 'chris' and
LastName = 'test' and
Address1 = '123 main st' and
City = 'chicago' and
State = 'il' and
Zip = '11111'
I'm trying to optimize the table for this query which will be run inside an "if exists()" construct, and not having much luck. With no index, and ~110,000 rows of test data, I get:
Table 'Profiles'. Scan count 1, logical reads 2021, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times: CPU time = 16 ms, elapsed time = 70 ms.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.
The execution plan indicates a clustered index scan on the primary key. I thought about adding an index for all the values, but that would create a rather large key and I'd like to try to avoid that if possible. My next thought was to index something that would be reasonably different between rows, so I tried indexing on first name (though I could use address line 1, or zip, for example), so I created the index:
create index ix_profiles_firstName on profiles(FirstName)
Now, when I run the same query, I get
Table 'Profiles'. Scan count 1, logical reads 171, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 52 ms.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.
Obviously, indexing first name made a HUGE difference. My question is, how do I decide whether I should index first name vs last name vs address vs. zip code? Is there a command I can run over my sample data that will tell me uniqueness of values in each field? My understanding is that I should try to index the column with the most uniqueness to make the index work best, correct?