views:

827

answers:

3

I am attempting to select a distinct list where duplicates are created over several fields. For example,

SELECT tablename.field1Date, 
       tablename.field2Number, 
       tablename.field3Text 
FROM tablename;

Would select duplicating records over the date, number and text fields respectively.

Now, when I select distinct records to provide what I am looking for, the performance seems to decrease dramatically.

SELECT DISTINCT tablename.field1Date,
                tablename.field2Number, 
                tablename.field3Text 
FROM tablename;

Is there any known reasons for this? I must admit I am using MS Access 2003 which may be the issue.

+8  A: 

Yes, basically it has to sort the results and then re-processed to eliminate the duplicates. This cull could also be being done during the sort, but we can only speculate as to how exactly the code works in the background. You could try and improve the performance by creating an index composed of all three (3) fields.

CodeSlave
Do you think there is a performance difference between a compound index on all three fields and individual indexes on each field?
David-W-Fenton
I would expect that there would be some improvement. Otherwise a compound index's only use would be for ensuring uniqueness, rather than helping with searches for n-tuples.
CodeSlave
+1  A: 

Yes, the application needs to compare every record to the "distinct" records cache as it goes. You can improve performance by using an index, particularly on the numeric and date fields.

Diodeus
The operation you describe is O(n²). It's therefore more likely that CodeSlave's answer is correct, since sorting only takes O(n log n) and eliminating duplicates from a sorted list is O(n).
David Schmitt
+4  A: 

This page has tips on improving your query performance and also some information on using the performance analyzer. It will tell you what if any indexes are needed.

http://support.microsoft.com/kb/209126

Erik Nedwidek