ansaurus

Question

Best way to get distinct values from large table

Answer 1

+7 A:

Using a simple index on the columns required (Year and Month) should greatly improve either a DISTINCT, or GROUP BY Query.

I would not go with a secondary table as this adds extra over head to maintaining the secondary table (inserts/updates deletes will require that you validate the secondary table)

EDIT:

You might even want to consider using Improving Performance with SQL Server 2005 Indexed Views

astander 2010-04-21 18:21:09

+1. Dont even think about having another table!!

Aseem Gautam 2010-04-21 18:23:07

I suggested an index, but was told that a distinct/group by would still be slow on a table with a few million records

derivation 2010-04-21 18:25:23

Agreed. The secondary table is a bad idea -- not just from a hypothetical "this is not normalized" standpoint, but from an unintended maintenance consequences standpoint. Create an index and be done with it!

Bob Kaufman 2010-04-21 18:25:30

Answer 2

+1 A:

create a materialized indexed view of:

SELECT DISTINCT
    MonthCol, YearCol
    FROM YourTable

you will now get access to the pre-computed distinct values without going through the work every time.

KM 2010-04-21 18:30:59

this adds an overhead for insert/updateand if the table grows about 100k-150k records by month it will be a big overhead.I would love to know that this high selection on this columns is not due to checking that the line exist's before inserting or updating it.

Gabriel Guimarães 2010-04-21 18:56:24

@Gabriel Guimarães, I answered assuming that they had the index in place and that it was still slow. This view will make the select just about instant. However, there is no free lunch, you gain massive select speed for some insert/update/delete overhead (150k per month is not that many per second). OP says that they `frequently need to get the available month and year combinations` which would then use this view, and free up resources and possibly even help any transactions writing to this table.

KM 2010-04-21 19:06:52

Answer 3

+1 A:

Make the date the first column in the table's clustered index key. This is very typical for historic data, because most, if not all, queries are interested in specific ranges and a clustered index on time can address this. All queries like 'month of May' need to be addressed as ranges, eg: WHERE DATECOLKEY BETWEEN '05/01/2010' AND '06/01/2001'. Answering a question like 'are there any records in May' will involve a simple seek into the clustered index.

While this seems complicated for a programmer mind, it is the optimal way to approach a database design problem.

Remus Rusanu 2010-04-21 19:01:46

Answer 4

+1 A:

Make sure to have an Clustered Index on those columns. and partition your table on these date columns an place the datafiles on different disk drives I Believe keeping your index fragmentation low is your best shot.

I also Believe having a physical view with the desired select is not a good idea, because it adds Insert/Update overhead. on average there's 3,5 insert's per minute. or about 17 seconds between each insert (on average please correct me if I'm wrong)

The question is are you selecting more often than every 17 seconds? That's the key thought. Hope it helped.

Gabriel Guimarães 2010-04-21 21:36:28

ansaurus

tags:

views:

answers:

Best way to get distinct values from large table

related questions