views:

91

answers:

4

I have a db table with about 10 or so columns, two of which are month and year. The table has about 250k rows now, and we expect it to grow by about 100-150k records a month. A lot of queries involve the month and year column (ex, all records from march 2010), and so we frequently need to get the available month and year combinations (ie do we have records for april 2010?).

A coworker thinks that we should have a separate table from our main one that only contains the months and years we have data for. We only add records to our main table once a month, so it would just be a small update on the end of our scripts to add the new entry to this second table. This second table would be queried whenever we need to find the available month/year entries on the first table. This solution feels kludgy to me and a violation of DRY.

What do you think is the correct way of solving this problem? Is there a better way than having two tables?

+7  A: 

Using a simple index on the columns required (Year and Month) should greatly improve either a DISTINCT, or GROUP BY Query.

I would not go with a secondary table as this adds extra over head to maintaining the secondary table (inserts/updates deletes will require that you validate the secondary table)

EDIT:

You might even want to consider using Improving Performance with SQL Server 2005 Indexed Views

astander
+1. Dont even think about having another table!!
Aseem Gautam
I suggested an index, but was told that a distinct/group by would still be slow on a table with a few million records
derivation
Agreed. The secondary table is a bad idea -- not just from a hypothetical "this is not normalized" standpoint, but from an unintended maintenance consequences standpoint. Create an index and be done with it!
Bob Kaufman
+1  A: 

create a materialized indexed view of:

SELECT DISTINCT
    MonthCol, YearCol
    FROM YourTable

you will now get access to the pre-computed distinct values without going through the work every time.

KM
this adds an overhead for insert/updateand if the table grows about 100k-150k records by month it will be a big overhead.I would love to know that this high selection on this columns is not due to checking that the line exist's before inserting or updating it.
Gabriel Guimarães
@Gabriel Guimarães, I answered assuming that they had the index in place and that it was still slow. This view will make the select just about instant. However, there is no free lunch, you gain massive select speed for some insert/update/delete overhead (150k per month is not that many per second). OP says that they `frequently need to get the available month and year combinations` which would then use this view, and free up resources and possibly even help any transactions writing to this table.
KM
+1  A: 

Make the date the first column in the table's clustered index key. This is very typical for historic data, because most, if not all, queries are interested in specific ranges and a clustered index on time can address this. All queries like 'month of May' need to be addressed as ranges, eg: WHERE DATECOLKEY BETWEEN '05/01/2010' AND '06/01/2001'. Answering a question like 'are there any records in May' will involve a simple seek into the clustered index.

While this seems complicated for a programmer mind, it is the optimal way to approach a database design problem.

Remus Rusanu
+1  A: 

Make sure to have an Clustered Index on those columns. and partition your table on these date columns an place the datafiles on different disk drives I Believe keeping your index fragmentation low is your best shot.

I also Believe having a physical view with the desired select is not a good idea, because it adds Insert/Update overhead. on average there's 3,5 insert's per minute. or about 17 seconds between each insert (on average please correct me if I'm wrong)

The question is are you selecting more often than every 17 seconds? That's the key thought. Hope it helped.

Gabriel Guimarães