ansaurus

Question

Efficient SQL 2000 Query for Selecting Preferred Candy

Answer 1

A:

select name, candy, max(preference)
from tablename
where candy=@candy
order by name, candy

usually indexing is required on columns which are frequently included in where clause. In this case I would say indexing on name and candy columns would be of highest priority.

Having lookup tables for columns usually depends on number of repeating values with in columns. Out of 250,000 rows, if there are only 50 values that are repeating, you really need to have integer reference (foreign key) there. In this case, candy reference should be done and name reference really depends on the number of distinct people within the database.

SQL Baba 2009-06-28 17:20:13

Aggregate functions (such as MAX()) only make sense if you do a GROUP BY. But a simple GROUP BY will not work because the candy and the preference must remain associated.

Lucero 2009-06-28 21:43:06

Answer 2

A:

I changed your column Name to PersonName to avoid any common reserved word conflicts.

SELECT     PersonName, MAX(Candy) AS PreferredCandy, MAX(PreferenceFactor) AS Factor
FROM         CandyPreference
GROUP BY PersonName
ORDER BY Factor DESC

David Walker 2009-06-28 17:21:55

Nope, won't work. You'll get the max preference factor correct, but it will return the maximum candy, not the candy with maximum preference.

erikkallen 2009-06-28 18:06:47

Answer 3

+1 A:

hi you could use following select statements

select Name,Candy,PreferenceFactor
from candyTable ct 
where PreferenceFactor = 
    (select max(PreferenceFactor) 
     from candyTable where ct.Name = Name)

but with this select you will get "Chris" 2 times in your result set.

if you want to get the the most preferred food by user than use

select top 1 Name,Candy,PreferenceFactor
from candyTable ct
where name = @name
and PreferenceFactor= 
    (select max([PreferenceFactor]) 
     from candyTable where name = @name )

i think changing the name and candy to integer types might help you improve performance. you also should insert indexes on both columns.

[Edit] changed ! to @

nWorx 2009-06-28 17:57:39

this is the right idea

Peter 2009-06-28 18:17:01

The second sample is useless (what is "name != name" going to help?).Anyways, nested select statements which check against a column of the outer are very inefficient and will slow down the query more or less exponetially with the number of items.

Lucero 2009-06-28 21:41:43

it's not != i wrote = !name !name for a parameter... i don't know why but last time could not type the "@" signal, when i first wrote the answer.. strange

nWorx 2009-06-28 23:37:23

i've edited the answer now... i copied the @ signal, cannot use the keyboard function.

nWorx 2009-06-28 23:40:38

Answer 4

+1 A:

SELECT Name, Candy, PreferenceFactor
  FROM table AS a
 WHERE NOT EXISTS(SELECT * FROM table AS b
                   WHERE b.Name = a.Name
                     AND (b.PreferenceFactor > a.PreferenceFactor OR (b.PreferenceFactor = a.PreferenceFactor AND b.Candy > a.Candy))

erikkallen 2009-06-28 18:08:42

Nested select statements which check against a column of the outer select are very inefficient and will slow down the query more or less exponentially with the number of items.

Lucero 2009-06-28 21:44:38

Nope, SQL server is good at those and will usually convert them to a join.

erikkallen 2009-06-29 08:40:28

The 2005 and 2008 versions have been optimized in this direction, but I remember several queries very similar to this in SQL Server 2000 (and that's the version at hand here, I'd used a common table expression otherwise for this kind of task) which had serious performance problems; a look into the execution plan also showed this.

Lucero 2009-06-29 11:49:48

I like this one, but I wish you had formatted it to be more readable.

Peter 2009-06-29 23:29:36

@Lucero: You probably have some missing statistics or something otherwise wrong with your database. I've done this kind of things regularly even on 2000, and it has always worked well for me.

erikkallen 2009-07-12 16:05:10

@Peter: reformatted now

erikkallen 2009-07-12 16:06:20

Answer 5

A:

SELECT d.Name, a.Candy, d.MaxPref
FROM myTable a, (SELECT Name, MAX(PreferenceFactor) AS MaxPref FROM myTable) as D
WHERE a.Name = d.Name AND a.PreferenceFactor = d.MaxPref

This should give you rows with matching PrefFactor for a given Name. (e.g. if John as a HighPref of 1 for Lemon & Chocolate).

Pardon my answer as I am writing it without SQL Query Analyzer.

shahkalpesh 2009-06-28 18:20:41

Not bad, but won't work for the case where a person has several preferred candies with the same preference.

Lucero 2009-06-28 21:46:05

You are right Lucero. The query will have to be modified to pick the first row, when there is same preference for candies.

shahkalpesh 2009-06-29 03:24:52

Answer 6

A:

Something like this would work:

select name
, candy  = substring(preference,7,len(preference))
  -- convert back to float/numeric
, factor = convert(float,substring(preference,1,5))/10
from (
  select name, 
    preference = (
      select top 1 
           -- convert from float/numeric to zero-padded fixed-width string
           right('00000'+convert(varchar,convert(decimal(5,0),preferencefactor*10)),5)
         + ';' + candy
       from candyTable b
       where a.name = b.name
       order by 
         preferencefactor desc
       , candy
       )
  from (select distinct name from candyTable) a
  ) a

Performance should be decent with with method. Check your query plan.

TOP 1 ... ORDER BY in a correlated subquery allows us to specify arbitrary rules for which row we want returned per row in the outer query. In this case, we want the highest preference factor per name, with candy for tie-breaks.

Subqueries can only return one value, so we must combine candy and preference factor into one field. The semicolon is just for readability here, but in other cases, you might use it to parse the combined field with CHARINDEX in the outer query.

If you wanted full precision in the output, you could use this instead (assuming preferencefactor is a float):

convert(varchar,preferencefactor) + ';' + candy

And then parse it back with:

factor = convert(float,substring(preference,1,charindex(';',preference)-1))
candy = substring(preference,charindex(';',preference)+1,len(preference))

Peter 2009-06-28 18:31:19

Interesting. Could you comment on the rationale behind the conversion and such?

John Saunders 2009-06-28 18:36:52

The idea seems to be the combination of the preference and the candy into one sortable column, and then they need to be cut apart again later. However, this fails with preferences which have a very small difference (less than some 0.00001 or so).

Lucero 2009-06-28 21:48:30

@Lucero: no, it does not fail with preferences which have a very small difference. The sorting happens before conversion, not after.

Peter 2009-06-29 12:53:14

@John, I added some comments. It occurred to me the conversion to zero-padded fixed width might not be necessary.

Peter 2009-06-29 13:31:43

Answer 7

+2 A:

I tried:

SELECT X.PersonName,
 (
  SELECT TOP 1 Candy
  FROM CandyPreferences
  WHERE PersonName=X.PersonName AND PreferenceFactor=x.HighestPreference
 ) AS TopCandy
FROM 
(
 SELECT PersonName, MAX(PreferenceFactor) AS HighestPreference
 FROM CandyPreferences
 GROUP BY PersonName
) AS X

This seems to work, though I can't speak to efficiency without real data and a realistic load.

I did create a primary key over PersonName and Candy, though. Using SQL Server 2008 and no additional indexes shows it using two clustered index scans though, so it could be worse.

I played with this a bit more because I needed an excuse to play with the Data Generation Plan capability of "datadude". First, I refactored the one table to have separate tables for candy names and person names. I did this mostly because it allowed me to use the test data generation without having to read the documentation. The schema became:

CREATE TABLE [Candies](
 [CandyID] [int] IDENTITY(1,1) NOT NULL,
 [Candy] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_Candies] PRIMARY KEY CLUSTERED 
(
 [CandyID] ASC
),
 CONSTRAINT [UC_Candies] UNIQUE NONCLUSTERED 
(
 [Candy] ASC
)
)
GO

CREATE TABLE [Persons](
 [PersonID] [int] IDENTITY(1,1) NOT NULL,
 [PersonName] [nvarchar](100) NOT NULL,
 CONSTRAINT [PK_Preferences.Persons] PRIMARY KEY CLUSTERED 
(
 [PersonID] ASC
)
)
GO

CREATE TABLE [CandyPreferences](
 [PersonID] [int] NOT NULL,
 [CandyID] [int] NOT NULL,
 [PrefernceFactor] [real] NOT NULL,
 CONSTRAINT [PK_CandyPreferences] PRIMARY KEY CLUSTERED 
(
 [PersonID] ASC,
 [CandyID] ASC
)
)
GO

ALTER TABLE [CandyPreferences]  
WITH CHECK ADD  CONSTRAINT [FK_CandyPreferences_Candies] FOREIGN KEY([CandyID])
REFERENCES [Candies] ([CandyID])
GO

ALTER TABLE [CandyPreferences] 
CHECK CONSTRAINT [FK_CandyPreferences_Candies]
GO

ALTER TABLE [CandyPreferences]  
WITH CHECK ADD  CONSTRAINT [FK_CandyPreferences_Persons] FOREIGN KEY([PersonID])
REFERENCES [Persons] ([PersonID])
GO

ALTER TABLE [CandyPreferences] 
CHECK CONSTRAINT [FK_CandyPreferences_Persons]
GO

The query became:

SELECT P.PersonName, C.Candy
FROM (
 SELECT X.PersonID,
  (
   SELECT TOP 1 CandyID
   FROM CandyPreferences
   WHERE PersonID=X.PersonID AND PrefernceFactor=x.HighestPreference
  ) AS TopCandy
 FROM 
 (
  SELECT PersonID, MAX(PrefernceFactor) AS HighestPreference
  FROM CandyPreferences
  GROUP BY PersonID
 ) AS X
) AS Y
INNER JOIN Persons P ON Y.PersonID = P.PersonID
INNER JOIN Candies C ON Y.TopCandy = C.CandyID

With 150,000 candies, 200,000 persons, and 500,000 CandyPreferences, the query took about 12 seconds and produced 200,000 rows.

The following result surprised me. I changed the query to remove the final "pretty" joins:

SELECT X.PersonID,
 (
  SELECT TOP 1 CandyID
  FROM CandyPreferences
  WHERE PersonID=X.PersonID AND PrefernceFactor=x.HighestPreference
 ) AS TopCandy
FROM 
(
 SELECT PersonID, MAX(PrefernceFactor) AS HighestPreference
 FROM CandyPreferences
 GROUP BY PersonID
) AS X

This now takes two or three seconds for 200,000 rows.

Now, to be clear, nothing I've done here has been meant to improve the performance of this query: I considered 12 seconds to be a success. It now says it spends 90% of its time in a clustered index seek.

John Saunders 2009-06-28 18:41:48

Nested select statements which check against a column of the outer select are inefficient and will slow down the query with a large number of items. Since you only do it once per name, this has less impact than in the suggestions others have made.

Lucero 2009-06-28 21:51:25

Can you characterize "inefficient"? The clustered index scan didn't seem like such a bad thing. Adding an index over PersonName, Preference changed it to use a nonclustered index seek and nonclustered index scan over the new index instead of using the primary key. Can't tell perf difference w/o test data.

John Saunders 2009-06-28 23:48:25

@Lucero: You are wrong. SQL server can without problem turn this into join equivalents. Check the execution plans, I promise you it does it if it finds it worth it.

erikkallen 2009-06-29 08:43:41

Are you guys doing that on SQL Server 2000? Please don't compare apples and oranges.

Lucero 2009-06-29 11:51:39

@Lucero: good point. I don't have SQL Server 2000 running anywhere. I don't think I've used it since about 2003.

John Saunders 2009-06-29 12:09:12

@John, having created the test data, did you test this against the other methods?

Peter 2009-06-29 23:33:42

@Peter: No, I didn't. Among other things, as Lucero points out, I'd be testing SQL Server 2008 against SQL Server 2000, on a laptop as opposed to a server. About the only useful comparison would be to compare the execution plans. BTW, the Database Tuning Advisor (they called it something different for SQL2000) had no additional recommendations.

John Saunders 2009-06-30 00:25:04

Hi John, I tried your results on my production SQL 2000 server - it resulted in execution times about 3 times slower than that of the Emtucifor and OrbMan solutions. I imagine that SQL 2008 has a better query optimizer than SQL 2000, which is why you don't see an appreciable difference.

Brian Gillespie 2009-06-30 17:05:48

Interesting. Can you characterize the execution plan? I think I may have captured the wrong one yesterday, as 90% on a clustered index seek didn't seem to make much sense. I'll try to post real numbers later today.

John Saunders 2009-06-30 18:01:43

Answer 8

+4 A:

select c.Name, max(c.Candy) as Candy, max(c.PreferenceFactor) as PreferenceFactor
from Candy c
inner join (
    select Name, max(PreferenceFactor) as MaxPreferenceFactor
    from Candy
    group by Name
) cm on c.Name = cm.Name and c.PreferenceFactor = cm.MaxPreferenceFactor
group by c.Name
order by PreferenceFactor desc, Name

RedFilter 2009-06-28 20:06:11

I'd go with this approach, well done.

Lucero 2009-06-28 21:53:22

I tried both this solution and Emtucifor's solution (which appears to be winning in voting). In my table, I have foreign key references to both Candy and Name, and have an index that covers these columns. In my testing both Emtucifor's and OrbMan's solutions were identical in query time, and gave identical results. Since OrbMan's solution doesn't involve the complexity of packing and unpacking data, I selected this answer as my accepted answer.

Brian Gillespie 2009-06-30 16:56:05

You are right to choose this answer as the right one if performance is similar. It all depends on the data. Could you post the query you used or send it to me at stackoverflow dit com atdomain esquared dit mooo dit com ? Thanks. dit -> . atdomain -> @

Emtucifor 2009-06-30 21:48:32

Answer 9

+6 A:

You will find that the following query outperforms every other answer given, as it works with a single scan. This simulates MS Access's First and Last aggregate functions, which is basically what you are doing.

Of course, you'll probably have foreign keys instead of names in your CandyPreference table. To answer your question, it is in fact very much best if Candy and Name are foreign keys into another table.

If there are other columns in the CandyPreferences table, then having a covering index that includes the involved columns will yield even better performance. Making the columns as small as possible will increase the rows per page and again increase performance. If you are most often doing the query with a WHERE condition to restrict rows, then an index that covers the WHERE conditions becomes important.

Peter was on the right track for this, but had some unneeded complexity.

CREATE TABLE #CandyPreference (
   [Name] varchar(20),
   Candy varchar(30),
   PreferenceFactor decimal(11, 10)
)
INSERT #CandyPreference VALUES ('Jim', 'Chocolate', 1.0)
INSERT #CandyPreference VALUES ('Brad', 'Lemon Drop', .9)
INSERT #CandyPreference VALUES ('Brad', 'Chocolate', .1)
INSERT #CandyPreference VALUES ('Chris', 'Chocolate', .5)
INSERT #CandyPreference VALUES ('Chris', 'Candy Cane', .5)

SELECT
   [Name],
   Candy = Substring(PackedData, 13, 30),
   PreferenceFactor = Convert(decimal(11,10), Left(PackedData, 12))
FROM (
   SELECT
      [Name],
      PackedData = Max(Convert(char(12), PreferenceFactor) + Candy)
   FROM CandyPreference
   GROUP BY [Name]
) X

DROP TABLE #CandyPreference

I actually don't recommend this method unless performance is critical. The "canonical" way to do it is OrbMan's standard Max/GROUP BY derived table and then a join to it to get the selected row. Though, that method starts to become difficult when there are several columns that participate in the selection of the Max, and the final combination of selectors can be duplicated, that is, when there is no column to provide arbitrary uniqueness as in the case here where we use the name if the PreferenceFactor is the same.

Edit: It's probably best to give some more usage notes to help improve clarity and to help people avoid problems.

As a general rule of thumb, when trying to improve query performance, you can do a LOT of extra math if it will save you I/O. Saving an entire table seek or scan speeds up the query substantially, even with all the converts and substrings and so on.
Due to precision and sorting issues, use of a floating point data type is probably a bad idea with this method. Though unless you are dealing with extremely large or small numbers, you shouldn't be using float in your database anyway.
The best data types are those that are not packed and sort in the same order after conversion to binary or char. Datetime, smalldatetime, bigint, int, smallint, and tinyint all convert directly to binary and sort correctly because they are not packed. With binary, avoid left() and right(), use substring() to get the values reliably returned to their originals.
I took advantage of Preference having only one digit in front of the decimal point in this query, allowing conversion straight to char since there is always at least a 0 before the decimal point. If more digits are possible, you would have to decimal-align the converted number so things sort correctly. Easiest might be to multiply your Preference rating so there is no decimal portion, convert to bigint, and then convert to binary(8). In general, conversion between numbers is faster than conversion between char and another data type, especially with date math.
Watch out for nulls. If there are any, you must convert them to something and then back.

Emtucifor 2009-06-29 22:22:55

Nice. I'm very impressed with the single scan. Can you suggest an improvement on my method that still supports an arbitrary ORDER BY?

Peter 2009-06-29 23:41:29

Thanks, Peter! A complex CASE expression that determines which column(s) to put in the packed value, plus the character positions and lengths to pull the columns back out, would do the trick (More CPU still cheaper than more I/O). If you're really interested in this, comment again and I'll try work something up for you. Or, can you send me a message by visiting my profile? Either way.

Emtucifor 2009-06-30 00:01:03

Now I understand what you were getting at before, that the correlated subquery is unnecessary. At first I thought it was a criticism of how I packed and unpacked the values. Yes, I'd be interested in seeing how you would pack more values. The general form, as it were. And no, I don't know how to send a private message on SO. I don't see an option for that on the profile page, and I don't see any contact information.

Peter 2009-06-30 00:26:11

I added a cryptified email address to my profile in the comments.

Emtucifor 2009-06-30 00:30:44

Thanks - in my production database, both Name and Candy are foreign keys, and they are included in an index.After implementing both this solution and OrbMan's solution, I got essentially identical performance. I wish I could split the "accepted" answer between the two of you, but OrbMan's seems to solve the problem without the additional complexity of the packing.

Brian Gillespie 2009-06-30 16:53:27

Thanks for the update, Brian! Performance, as usual, depends. On everything. On the exact data, number of rows, statistics, selectivity, rows per page, query execution plans chosen, and so on! If OrbMan's query gives similar performance then DEFINITELY use his. I have had situations, though, where this technique was many times over faster. Some questions: did you use conversion to binary or character? Did you run a trace to see CPU and I/O, use the query execution plan, or measure it in query execution time from the client? A trace is the only true way to see actual performance.

Emtucifor 2009-06-30 18:26:42

Answer 10

+2 A:

Comment on Emtucifor solution (as I cant make regular comments)

I like this solution, but have some comments how it could be improved (in this specific case).

It can't be done much if you have everything in one table, but having few tables as in John Saunders' solution will make things a bit different.

As we are dealing with numbers in [CandyPreferences] table we can use math operation instead of concatenation to get max value.

I suggest PreferenceFactor to be decimal instead of real, as I believe we don't need here size of real data type, and even further I would suggest decimal(n,n) where n<10 to have only decimal part stored in 5 bytes. Assume decimal(3,3) is enough (1000 levels of preference factor), we can do simple

PackedData = Max(PreferenceFactor + CandyID)

Further, if we know we have less than 1,000,000 CandyIDs we can add cast as:

PackedData = Max(Cast(PreferenceFactor + CandyID as decimal(9,3)))

allowing sql server to use 5 bytes in temporary table

Unpacking is easy and fast using floor function.

Niikola

-- ADDED LATER ---

I tested both solutions, John's and Emtucifor's (modified to use John's structure and using my suggestions). I tested also with and without joins.

Emtucifor's solution clearly wins, but margins are not huge. It could be different if SQL server had to perform some Physical reads, but they were 0 in all cases.

Here are the queries:

    SELECT
   [PersonID],
   CandyID = Floor(PackedData),
   PreferenceFactor = Cast(PackedData-Floor(PackedData) as decimal(3,3))
FROM (
   SELECT
      [PersonID],
      PackedData = Max(Cast([PrefernceFactor] + [CandyID] as decimal(9,3)))
   FROM [z5CandyPreferences] With (NoLock)
   GROUP BY [PersonID]
) X

SELECT X.PersonID,
        (
                SELECT TOP 1 CandyID
                FROM z5CandyPreferences
                WHERE PersonID=X.PersonID AND PrefernceFactor=x.HighestPreference
        ) AS TopCandy,
                    HighestPreference as PreferenceFactor
FROM 
(
        SELECT PersonID, MAX(PrefernceFactor) AS HighestPreference
        FROM z5CandyPreferences
        GROUP BY PersonID
) AS X


Select p.PersonName,
       c.Candy,
       y.PreferenceFactor
  From z5Persons p
 Inner Join (SELECT [PersonID],
                    CandyID = Floor(PackedData),
                    PreferenceFactor = Cast(PackedData-Floor(PackedData) as decimal(3,3))
                    FROM ( SELECT [PersonID],
                                  PackedData = Max(Cast([PrefernceFactor] + [CandyID] as decimal(9,3)))
                             FROM [z5CandyPreferences] With (NoLock)
                            GROUP BY [PersonID]
                         ) X
            ) Y on p.PersonId = Y.PersonId
 Inner Join z5Candies c on c.CandyId=Y.CandyId

Select p.PersonName,
       c.Candy,
       y.PreferenceFactor
  From z5Persons p
 Inner Join (SELECT X.PersonID,
                    ( SELECT TOP 1 cp.CandyId
            FROM z5CandyPreferences cp
           WHERE PersonID=X.PersonID AND cp.[PrefernceFactor]=X.HighestPreference
                    ) CandyId,
                    HighestPreference as PreferenceFactor
               FROM ( SELECT PersonID, 
                             MAX(PrefernceFactor) AS HighestPreference
                        FROM z5CandyPreferences
                       GROUP BY PersonID
                    ) AS X
            ) AS Y on p.PersonId = Y.PersonId
 Inner Join z5Candies as c on c.CandyID=Y.CandyId

And the results:

 TableName          nRows
 ------------------ -------
 z5Persons          200,000
 z5Candies          150,000
 z5CandyPreferences 497,445


Query                       Rows Affected CPU time Elapsed time
--------------------------- ------------- -------- ------------
Emtucifor     (no joins)          183,289   531 ms     3,122 ms
John Saunders (no joins)          183,289 1,266 ms     2,918 ms
Emtucifor     (with joins)        183,289 1,031 ms     3,990 ms
John Saunders (with joins)        183,289 2,406 ms     4,343 ms


Emtucifor (no joins)
--------------------------------------------
Table               Scan count logical reads
------------------- ---------- -------------
z5CandyPreferences           1         2,022 


John Saunders (no joins)
--------------------------------------------
Table               Scan count logical reads
------------------- ---------- -------------
z5CandyPreferences     183,290       587,677

Emtucifor (with joins)
--------------------------------------------
Table               Scan count logical reads
------------------- ---------- -------------
Worktable                    0             0
z5Candies                    1           526
z5CandyPreferences           1         2,022
z5Persons                    1           733

John Saunders (with joins) 
--------------------------------------------
Table               Scan count logical reads
------------------- ---------- -------------
z5CandyPreferences      183292       587,912
z5Persons                    3           802
Worktable                    0             0
z5Candies                    3           559
Worktable                    0             0

Niikola 2009-06-30 08:14:23

Wow, thanks for doing that niikola. I did say in my post that mathematical conversions are best, but it seems you've put the preference to be least significance when it needs to be most significant? Also, may I suggest that 3,300 reads compared to 589,000 is a huge difference! For a very busy OLTP server, elapsed client time becomes less important (though still meaningful) and CPU time and reads become more important.

Emtucifor 2009-06-30 23:01:11

I agree elapsed time is not really relevant and could be ignored as depends on too many other things.3,300 against 589,000 reads looks like huge difference, but until they are just logical reads the real difference could be rather small (it depends of some other factors too). Nevertheless, I would always choose solution with less logical reads because in case of increased number of rows logical reads will sunner or later become physical ones.And you are right, I should use descending order for Preferences in index and queries as well as min aggregate function

Niikola 2009-07-01 15:56:15

Niikola, I'm saying that you're actually selecting the wrong rows--based on CandyID instead of the highest Preference.

Emtucifor 2009-07-01 22:31:53

oops, you are right, as CandyID is higher, it will always return Candy with highest ID.To correct this I should use for packing something like Cast(PreferenceFactor*1000 + CandyId/10000000000. as decimal(15,12))which makes query a bit slower.

Niikola 2009-07-07 11:32:49

Answer 11

A:

I tested also ROW_NUMBER() version + added additional index

Create index IX_z5CandyPreferences On z5CandyPreferences(PersonId,PrefernceFactor,CandyID)

Response times between Emtucifor's and ROW_NUMBER() version (with index in place) are marginal (if any - test should be repeated number of times and take averages, but I expect there would not be any significant difference)

Here is query:

Select p.PersonName,
       c.Candy,
       y.PrefernceFactor
  From z5Persons p
 Inner Join (Select * from (Select cp.PersonId,
       cp.CandyId,
       cp.PrefernceFactor,
       ROW_NUMBER() over (Partition by cp.PersonId Order by cp.PrefernceFactor, cp.CandyId ) as hp
  From z5CandyPreferences cp) X
   Where hp=1) Y on p.PersonId = Y.PersonId
 Inner Join z5Candies c on c.CandyId=Y.CandyId

and results with and without new index:

                           |     Without index    |      With Index
                           ----------------------------------------------
Query (Aff.Rows 183,290)   |CPU time Elapsed time | CPU time Elapsed time
-------------------------- |-------- ------------ | -------- ------------
Emtucifor     (with joins) |1,031 ms     3,990 ms |   890 ms     3,758 ms
John Saunders (with joins) |2,406 ms     4,343 ms | 1,735 ms     3,414 ms
ROW_NUMBER()  (with joins) |2,094 ms     4,888 ms |   953 ms     3,900 ms.


Emtucifor (with joins)         Without index |              With Index
-----------------------------------------------------------------------
Table              |Scan count logical reads | Scan count logical reads
-------------------|---------- ------------- | ---------- -------------
Worktable          |         0             0 |          0             0
z5Candies          |         1           526 |          1           526
z5CandyPreferences |         1         2,022 |          1           990
z5Persons          |         1           733 |          1           733

John Saunders (with joins)     Without index |              With Index
-----------------------------------------------------------------------
Table              |Scan count logical reads | Scan count logical reads
-------------------|---------- ------------- | ---------- -------------
z5CandyPreferences |    183292       587,912 |    183,290       585,570
z5Persons          |         3           802 |          1           733
Worktable          |         0             0 |          0             0
z5Candies          |         3           559 |          1           526
Worktable          |         0             0 |          -             -


ROW_NUMBER() (with joins)      Without index |              With Index 
-----------------------------------------------------------------------
Table              |Scan count logical reads | Scan count logical reads
-------------------|---------- ------------- | ---------- -------------
z5CandyPreferences |         3          2233 |          1           990
z5Persons          |         3           802 |          1           733
z5Candies          |         3           559 |          1           526
Worktable          |         0             0 |          0             0

Niikola 2009-06-30 11:04:01

How many rows are in your tables?

Emtucifor 2009-06-30 23:03:23

you have it in my previous post

Niikola 2009-07-01 15:49:15

ansaurus

tags:

views:

answers:

Efficient SQL 2000 Query for Selecting Preferred Candy

related questions