I have a SQL Server 2008 database with approximately 14 millions rows. In it there are two tables
Table1
rowId int, primary key
someData1 int
someData2 int...
Table2
id int, primary key
rowId ==> int, refers to the rowId from Table1
someCalculatedData int...
Table2.rowId is not a foreign key, but I did make a Non-clustered, Non-Unique index on it
When I import the data I insert all the data into Table1 and then find the rows that don't have an entry in Table2 and insert data into this table.
I was doing this in batches by selecting 250,000 rows at a time, processing the data and inserting them into Table2, and then finding the next 250,000 rows and so on until they were no rows in Table1 that don't have any entry in Table2.
select TOP 250000 rowId from Table1
where NOT EXISTS (select rowId from Table2 where Table1.rowId=Table2.rowId)
This query has really slowed down and now it takes over two minutes to get the next batch. If I run the query without the TOP or SET ROWCOUNT keywords and get all of the rows, then the query returns the results in ~15 seconds.
Does anyone know why TOP and SET ROWCOUNT cause the query to take so much longer then getting all of the data?
Can I improve the performance of the query and still only get a subset of the data each time?