ansaurus

Question

How can I efficiently manipulate 500k records in SQL Server 2005?

Answer 1

A:

There are more efficient ways of importing large blocks of data. Look in SQL Books Online under BCP (Bulk Copy Protocol.)

Jekke 2010-03-24 14:51:55

In this case, the import is fine. What we are having trouble with is manipulating it and adding to it once we get it into SQL Server

cdeszaq 2010-03-24 14:53:27

As I read it, it's not the copying of the data but enriching the data which is giving timeout problems.

extraneon 2010-03-24 14:54:51

Answer 2

+1 A:

Needs more information. I am manipulating 3-4 million rows in a 150 million row table regularly and I am NOT thinking this is a lot of data. I have a "products" table that contains about 8 million entries - includign full text search. No problems either.

Can you just elaborte on your hardware? I assume "normal desktop PC" or "low end server", both with absolutely non-optimal disc layout, and thus tons of IO problems - on updates.

TomTom 2010-03-24 14:54:55

Answer 3

+1 A:

Are you indexing your temp table after importing the data?

temp_table.external_id should definitely have an index since it is in the where clause.

dan 2010-03-24 14:56:08

There are a couple of indexes on the temp table, but they are on rather un-touched fields.

cdeszaq 2010-03-24 14:58:54

Answer 4

+5 A:

subqueries like the one you give us in the question:

UPDATE tempTable SET user_id = (SELECT user_id FROM myUsers WHERE external_id = tempTable.external_id)

are only good on one row at a time, so you must be looping. Think set based:

UPDATE t
    SET user_id = u.user_id
    FROM tempTable          t
        inner join myUsers  u ON t.external_id=u.external_id

and remove your loops, this will update all rows in one statement and be significantly faster!

KM 2010-03-24 14:56:46

The update statement is not being run in a loop...to update a field in all rows of the table, we are only firing off 1 command to SQL Server, as I indicated above. We did it in a loop at first, but switched because it was about an order of magnitude faster to use 1 command.

cdeszaq 2010-03-24 14:59:50

+1: i was just gonna write the same query. you beat me to it.

Numenor 2010-03-24 14:59:53

@cdeszaq: your query it self is like a forloop since it has a sub query that ran for each row in tempTable.

Numenor 2010-03-24 15:01:06

+1: I was about to add the same example. This row by row sub-query is much slower than a join.

Hogan 2010-03-24 15:04:00

So is it the case then that the SET clause works much like a SELECT clause but instead of returning those fields, it updates them instead? And I assume the where clause works the same as well in terms of being able to limit things off of anywhere that has been joined in?

cdeszaq 2010-03-24 16:55:37

Answer 5

+1 A:

Make sure you have indexes on your tables that you are doing the selects from. In your example UPDATE command, you select the user_id from the myUsers table. Do you have an index with the user_id column on the myUsers table? The downside of indexes is that they increase time for inserts/updates. Make sure you don't have indexes on the tables you are trying to update. If the tables you are trying to update do have indexes, consider dropping them and then rebuilding them after your import.

Finally, run your queries in SQL Server Management Studio and have a look at the execution plan to see how the query is being executed. Look for things like table scans to see where you might be able to optimize.

TLiebe 2010-03-24 14:57:41

Answer 6

+1 A:

Look at the KM's answer and don't forget about indexes and primary keys.

2010-03-24 14:59:05

ansaurus

tags:

views:

answers:

How can I efficiently manipulate 500k records in SQL Server 2005?

related questions