ansaurus

Question

Select million+ records while huge insert is running

Answer 1

A:

It depends on your hardware. Pulling 200000 rows out while there is data being inserted requires some serious IO, so unless you have a 30+disk system, it will be slow.

Also, is your rowID column indexed? This will help with the select, but could slow down the bulk insert.

ck 2009-05-22 08:43:26

Answer 2

A:

I am not sure, but doesn't bulk insert in MS SQL lock the whole table?

Grzegorz Gierlik 2009-05-22 08:46:19

It depends on the locking model. You may choose table lock or row lock when doing the bulk insert. Table lock is faster, but the table is unavailable for other requests. Row lock is slower, as only a single row is locked at a time.

Scoregraphic 2009-05-22 08:57:31

Answer 3

A:

As ck already said. Indexing is important. So make sure you have an appropriate index ready. I would not only set an index on rowId but also on Project. Also I would rewrite the where-clause to:

WHERE Project = 'AmWINS' AND rowid BETWEEN 8 AND 699999

Reason: I guess Project is more restrictive than rowid and - correct me, if I'm wrong - BETWEEN is faster than a < and > comparison.

Scoregraphic 2009-05-22 09:07:17

I think you'll find the execution engine would run the queries in the same way.

ck 2009-05-22 09:09:41

I think it depends on the index(es) on the table. I taught myself to write most restrictive columns first (a must for indices).

Scoregraphic 2009-05-22 10:00:40

Answer 4

A:

No amount of indexing will help here because it's a SELECT * query so it's most likely a PK scan or an horrendous bookup lookup

And the TOP is meaningless because there is no ORDER BY.

The simultaneous insert is probably misleading as far as I can tell, unless the table only has 2 columns and the bulk insert is locking the whole table. With a simple int IDENTITY column the insert and select may not interfere with each other too.

Especially if the bulk insert is only a few 1000s of rows (or even 10,000s)

Edit. The TOP and rowid values do not imply a million plus

gbn 2009-05-22 09:31:48

Answer 5

A:

How many fields are in your transactionlog table? Do you really need them all?

Also, if you could show your execution plan to us for that query we might be able to help more. Have you tried putting an indes on the Project field?

Also, have you benchmarked the time against similar queries? It may help in pinpointing a problem.

eg.

select * from dbo.transactionlog where rowid>7 and rowid <700000 and Project='AmWINS'

select * from dbo.transactionlog where Project='AmWINS'

select top 200000 * from dbo.transactionlog

kevchadders 2009-05-22 09:41:34

Answer 6

+2 A:

It may be possible for you to use the "Nolock" table hint, as described here:

Table Hints MSDN

Your SQL would become something like this:

select top 200000 * from dbo.transactionlog with (no lock) ...

This would achieve better performance if you aren't concerned about the complete accuracy of the data returned.

Richard 2009-05-22 09:50:34

+1 exactly; you're probably seeing lock escalation; But you should make sure that you have enough temp space too.

Richard 2009-05-22 10:08:54

Answer 7

A:

What are you doing with the 200,000 rows? Are you running this over a network? Depending on the width of your table, just getting that amount of data across the network could be the bulk of the time spent.

Tom H. 2009-05-22 11:55:57

Answer 8

A:

Actually, I am having serious problem on my current Production logging database, Basically, we only have one table (transactionlog). all the application log will be insert into this table. For Project like AmWINS, base on select count result, we have about 800K++ records inserted per day. The insertion of record are running 24 hours daily in Production environment. User would like to extract data from the table if user want to check the transaction logs. Therefore, we need to select the records out from the table if necessary.

I tried to simulate on UAT enviroment to pump in the volumn as per Production which already grow up to 10millions records until today. and while i try to extract records, at the same time, I simulate with a bulk insertion to make it look like as per production environment. It took like 5 mins just to extract 200k records.

During the extraction running, I monitor on the SQL phyiscal server CPU is spike up to 95%.

the tables have 13 fields and a identity turn on(rowid) with bigint. rowid is the PK. Indexes are create on Date, Project, module and RefNumber. the tables are created on rowlock and pagelock enabled. I am using SQL server 2005.

Hope you guys can give me some professional advices to enlighten me. Thanks.

2009-05-25 16:00:40

ansaurus

tags:

views:

answers:

Select million+ records while huge insert is running

related questions