views:

80

answers:

1

Hi,

we are facing the following problem and we are trying to come up with the best possible solution.

We are using SQL Server 2008. We have a table that has more than 600 millions records, and has about 25 columns. One of the columns is an ID and is indexed. We need to get a subset of records from this table. There are mainly 2 cases:

a) the subset contains anywhere between 1000 and 200 millions rows ; the IDs of the rows to get are stored in an indexed table in SQL Server;

b) the subset contains less than 1 million rows ; the IDs of the rows to get are stored in the application's server's memory (.NET web app).

Our challenge is to get this subset as fast as possible. We do need this to happen in a few seconds.

Our solutions so far:

a) we do a join between the two tables. That works but that is not fast enough. The query looks something like SELECT * FROM Big_Table JOIN IndexTable ON Big_Table.ID = IndexTable.ID.

b) we do not really have a solution. We tried to run a WHERE IN query but that simply takes forever if the subset approaches the million rows (SELECT * FROM Big_Table WHERE ID IN (ID1, ID2, ID3...)). A solution could be to store the indexes in a temporary table but then this falls back to case a.

We are trying to optimize SQL Server as much as we can, notably using good indexes and partioning. I am more asking today of what you think the best approach is to extract the subset in both cases (a and b).

Any idea or suggestion is appreciated.

Thanks

+1  A: 

The most obvious question for your problem a) (the SELECT * FROM Big_Table JOIN IndexTable ON Big_Table.ID = IndexTable.ID query) is: are you sure to have indices on the ID column in both tables?

marc_s
Yes all ID columns are indexed. It takes about 1 min to join the big table (600M) with a 100k index table. This is on a development server with non partioned tables, so I am expecting this time to be about halved on a production server with partioned tables. However even 30s would be too long.
Nicolas