ansaurus

Question

SQL Server index question - address lookup

Answer 1

+3 A:

For your query, you should create a composite index on all columns: (FirstName, LastName, address1, city, state, zip)

If you want a certain index to be used in SQL Server, issue:

SELECT  *
FROM    Profiles WITH (INDEX (index_name))
WHERE 
        FirstName = 'chris' and
        LastName = 'test' and
        Address1 = '123 main st' and
        City = 'chicago' and
        State = 'il' and
        Zip = '11111'

My question is, how do I decide whether I should index first name vs last name vs address vs. zip code?

Index all these values you are filtering on.

Note that you can efficiently filter on the first columns from the index, like:

SELECT  *
FROM    Profiles
WHERE   FirstName = 'chris'

will use the index to search on FirstName,

SELECT  *
FROM    Profiles
WHERE   FirstName = 'chris'
        AND LastName = 'test'

will use the index to search on both FirstName and LastName,

SELECT  *
FROM    Profiles
WHERE   FirstName = 'chris'
        AND City = 'chicago'

will use the index to search only on FirstName (you don't filter on LastName, there is a gap, and the index cannot be used to search on other columns)

Is there a command I can run over my sample data that will tell me uniqueness of values in each field?

SELECT   COUNT(DISTINCT FirstName) / COUNT(*)
FROM     Profiles

will show you FirstName reciprocal selectivity.

The more this value is, the less efficient is the index.

My understanding is that I should try to index the column with the most uniqueness to make the index work best, correct?

Yes.

Again, in your case you should index all columns. Most uniqueness is for sure on all columns taken together.

Quassnoi 2009-04-10 19:20:27

I tried the composite index before and wasn't gettting good results. For whatever reason, I just tried it again and got even better results. I use jmeter to submit randomized data to a web page that calls this procedure and was able to process 1500+ lookups/inserts per second. Awesome.

Chris 2009-04-10 19:25:58

With the composite index:Table 'Profiles'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Chris 2009-04-10 19:26:43

Answer 2

+1 A:

My question is, how do I decide whether I should index first name vs last name vs address vs. zip code?

Gather all queries you intend to use (if this is the only one, you're done). Then turn over the queries as a workload to the Index Tuning Wizard, and look at the recommendations.

My understanding is that I should try to index the column with the most uniqueness to make the index work best, correct?

The more unique an index is, the fewer results will be looked up out of the actual table. The narrower the index is, the faster it can be read. (this rule shows why a composite index on all criteria columns is no good).

David B 2009-04-10 19:24:49

Regarding your last statement, that's what I was taught / learned. In this case though, the actual performance hit from using a composite index doesn't seem to be present or relevant. I'm going to load a few million test rows over the weekend and see if that changes.

Chris 2009-04-10 19:34:10

Answer 3

A:

In addition to other answers...

What combination of filters will you be running? Try to cover the most popular combinations.

gbn 2009-04-10 19:27:31

The query will always try to match all columns - basically trying to prevent duplicate registrations.

Chris 2009-04-10 19:32:40

Ta. Quassnoi's answer is best

gbn 2009-04-11 08:45:30

Answer 4

+1 A:

If this query is as critical as it seems, then I suggest you create a derived column on the fields as concatenated; then in your query explicitly create the key. Sure it's redundant, but it may end up being simpler than messing with it in pieces if you don't have to.

le dorfier 2009-04-10 19:27:38

Answer 5

+1 A:

You have a couple of options. As Quassnoi indicated you could create a composite index. Another option I used in a slightly different scenaior was to generate a unique key based on the data. In my situation I was comparing Addresses and trying to prevent duplicates (Since we would geocode any new address and each geocoding cost $$).

Anyways essentially we took key pieces of the address and created a new key (Address, State & Zip). You could do the same thing, and then only compare against one column.

One gottcha is making sure you synchronize this column when the record changes. You could look at using a computed column and indexing that which might help to that effect.

JoshBerke 2009-04-10 19:29:17

I have used the method you describe in the past. Shame on me for not thinking of it in this case. Giving the particular scenario I'm testing for, I think Quassnoi's solution is sufficient, but I will keep yours in mind as well.

Chris 2009-04-10 19:32:08

ansaurus

tags:

views:

answers:

SQL Server index question - address lookup

related questions