views:

116

answers:

4

alt text

Here's the scenario:

I've got a table called MarketDataCurrent (MDC) that has live updating stock prices.

I've got one process called 'LiveFeed' which reads prices streaming from the wire, queues up inserts, and uses a 'bulk upload to temp table then insert/update to MDC table.' (BulkUpsert)

I've got another process which then reads this data, computes other data, and then saves the results back into the same table, using a similar BulkUpsert stored proc.

Thirdly, there are a multitude of users running a C# Gui polling the MDC table and reading updates from it.

Now, during the day when the data is changing rapidly, things run pretty smoothly, but then, after market hours, we've recently started seeing an increasing number of Deadlock exceptions coming out of the database, nowadays we see 10-20 a day. The imporant thing to note here is that these happen when the values are NOT changing.

Here's all the relevant info:

Table Def:

CREATE TABLE [dbo].[MarketDataCurrent](
 [MDID] [int] NOT NULL,
 [LastUpdate] [datetime] NOT NULL,
 [Value] [float] NOT NULL,
 [Source] [varchar](20) NULL, 
CONSTRAINT [PK_MarketDataCurrent] PRIMARY KEY CLUSTERED 
(
 [MDID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

-

alt text

I've got a Sql Profiler Trace Running, catching the deadlocks, and here's what all the graphs look like.

alt text

Process 258 is called the following 'BulkUpsert' stored proc, repeatedly, while 73 is calling the next one:

ALTER proc [dbo].[MarketDataCurrent_BulkUpload]
 @updateTime datetime,
 @source varchar(10)
as

begin transaction

update c with (rowlock) set LastUpdate = getdate(), Value = t.Value, Source = @source 
from MarketDataCurrent c INNER JOIN #MDTUP t ON c.MDID = t.mdid
where c.lastUpdate < @updateTime
and   c.mdid not in (select mdid from MarketData where LiveFeedTicker is not null     and     PriceSource like 'LiveFeed.%')
and   c.value <> t.value

insert  into MarketDataCurrent
with (rowlock)
select  MDID, getdate(), Value, @source from #MDTUP 
where mdid not in (select mdid from MarketDataCurrent with (nolock))
and  mdid not in (select mdid from MarketData where LiveFeedTicker is not null     and PriceSource like 'LiveFeed.%')

commit

And the other one:

ALTER PROCEDURE [dbo].[MarketDataCurrent_LiveFeedUpload] 
AS
begin transaction

 -- Update existing mdid
 UPDATE c WITH (ROWLOCK) SET LastUpdate = t.LastUpdate, Value = t.Value, Source = t.Source 
 FROM MarketDataCurrent c INNER JOIN #TEMPTABLE2 t ON c.MDID = t.mdid;

 -- Insert new MDID
 INSERT INTO MarketDataCurrent with (ROWLOCK) SELECT * FROM #TEMPTABLE2 
 WHERE MDID NOT IN (SELECT MDID FROM MarketDataCurrent with (NOLOCK))

 -- Clean up the temp table
 DELETE #TEMPTABLE2

commit

To clarify, those Temp Tables are being created by the C# code on the same connection and are populated using the C# SqlBulkCopy class.

To me it looks like it's deadlocking on the PK of the table, so I tried removing that PK and switching to a Unique Constraint instead but that increased the number of deadlocks 10-fold.

I'm totally lost as to what to do about this situation and am open to just about any suggestion.

HELP!!


In response to the request for the XDL, here it is:

<deadlock-list>
 <deadlock victim="processc19978">
  <process-list>
   <process id="processaf0b68" taskpriority="0" logused="0" waitresource="KEY: 6:72057594090487808 (d900ed5a6cc6)" waittime="718" ownerId="1102128174" transactionname="user_transaction" lasttranstarted="2010-06-11T16:30:44.750" XDES="0xffffffff817f9a40" lockMode="U" schedulerid="3" kpid="8228" status="suspended" spid="73" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2010-06-11T16:30:44.750" lastbatchcompleted="2010-06-11T16:30:44.750" clientapp=".Net SqlClient Data Provider" hostname="RISKAPPS_VM" hostpid="3836" loginname="RiskOpt" isolationlevel="read committed (2)" xactid="1102128174" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
    <executionStack>
     <frame procname="MKP_RISKDB.dbo.MarketDataCurrent_BulkUpload" line="28" stmtstart="1062" stmtend="1720" sqlhandle="0x03000600a28e5e4ef4fd8e00849d00000100000000000000">
UPDATE c WITH (ROWLOCK) SET LastUpdate = getdate(), Value = t.Value, Source = @source 
FROM MarketDataCurrent c INNER JOIN #MDTUP t ON c.MDID = t.mdid
WHERE c.lastUpdate &lt; @updateTime
and   c.mdid not in (select mdid from MarketData where BloombergTicker is not null and PriceSource like &apos;Blbg.%&apos;)
and   c.value &lt;&gt; t.value     </frame>
     <frame procname="adhoc" line="1" stmtstart="88" sqlhandle="0x01000600c1653d0598706ca7000000000000000000000000">
exec MarketDataCurrent_BulkUpload @clearBefore, @source     </frame>
     <frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000">
unknown     </frame>
    </executionStack>
    <inputbuf>
(@clearBefore datetime,@source nvarchar(10))exec MarketDataCurrent_BulkUpload @clearBefore, @source    </inputbuf>
   </process>
   <process id="processc19978" taskpriority="0" logused="0" waitresource="KEY: 6:72057594090487808 (74008e31572b)" waittime="718" ownerId="1102128228" transactionname="user_transaction" lasttranstarted="2010-06-11T16:30:44.780" XDES="0x380be9d8" lockMode="U" schedulerid="5" kpid="8464" status="suspended" spid="248" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2010-06-11T16:30:44.780" lastbatchcompleted="2010-06-11T16:30:44.780" clientapp=".Net SqlClient Data Provider" hostname="RISKBBG_VM" hostpid="4480" loginname="RiskOpt" isolationlevel="read committed (2)" xactid="1102128228" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
    <executionStack>
     <frame procname="MKP_RISKDB.dbo.MarketDataCurrentBlbgRtUpload" line="14" stmtstart="840" stmtend="1220" sqlhandle="0x03000600005f9d24c8878f00849d00000100000000000000">
UPDATE c WITH (ROWLOCK) SET LastUpdate = t.LastUpdate, Value = t.Value, Source = t.Source 
        FROM MarketDataCurrent c INNER JOIN #TEMPTABLE2 t ON c.MDID = t.mdid;

        -- Insert new MDID     </frame>
     <frame procname="adhoc" line="1" sqlhandle="0x010006004a58132228bf8d73000000000000000000000000">
MarketDataCurrentBlbgRtUpload     </frame>
    </executionStack>
    <inputbuf>
MarketDataCurrentBlbgRtUpload    </inputbuf>
   </process>
  </process-list>
  <resource-list>
   <keylock hobtid="72057594090487808" dbid="6" objectname="MKP_RISKDB.dbo.MarketDataCurrent" indexname="PK_MarketDataCurrent" id="lock5ba77b00" mode="U" associatedObjectId="72057594090487808">
    <owner-list>
     <owner id="processc19978" mode="U"/>
    </owner-list>
    <waiter-list>
     <waiter id="processaf0b68" mode="U" requestType="wait"/>
    </waiter-list>
   </keylock>
   <keylock hobtid="72057594090487808" dbid="6" objectname="MKP_RISKDB.dbo.MarketDataCurrent" indexname="PK_MarketDataCurrent" id="lock65dca340" mode="U" associatedObjectId="72057594090487808">
    <owner-list>
     <owner id="processaf0b68" mode="U"/>
    </owner-list>
    <waiter-list>
     <waiter id="processc19978" mode="U" requestType="wait"/>
    </waiter-list>
   </keylock>
  </resource-list>
 </deadlock>
</deadlock-list>
+1  A: 

It's happening after main business hours, the data isn't changing and it just started recently. Did anything recently change on the server? I would suspect some new database maintenance job might be interfering.

BTW if you know the market is closed and the data isn't changing, why is your process still running?

HLGEM
I manage the server and know everything that runs on it. There is a SQL DTSS backup job that does a full backup every night to the same physical hard drive, then another DTSS on another SQL server which copies from my server's local drive to a network 'Snap Server'
skimania
I'm oversimplifying about the data not updating at all. Most markets are closed, but FX is actually open 24/5 so we need to process that market.
skimania
A: 

Looking at the Primary Key constraint, what if i changed some of the flags:

FROM:

 CONSTRAINT [PK_MarketDataCurrent] PRIMARY KEY CLUSTERED 
 (
    [MDID] ASC
 ) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
         ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
 ) ON [PRIMARY]

To:

 CONSTRAINT [PK_MarketDataCurrent] PRIMARY KEY CLUSTERED 
 (
    [MDID] ASC
 ) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = **ON**, IGNORE_DUP_KEY = OFF, 
         ALLOW_ROW_LOCKS  = **ON**, ALLOW_PAGE_LOCKS  = **OFF**) ON [PRIMARY]
 ) ON [PRIMARY]

Anybody think that could help?

skimania
+2  A: 

The deadlock seems to be straight forward deadlocks on key access order. One trivial explanation is overlap of the updated keys between the two bulk update operations.

A less trivial explanation though is that in SQL Server (and other servers too) the locked keys are hashed, and there is a (pretty significant) hash collision probability. This would explain why you see more deadlocks lately compared with before: simply your volume of data has increased and therefore the collision probability has increased. If this seems esoteric and improbable, just read on at %%lockres%% collision probability magic marker: 16,777,215, and the article linked from that. The probability is surprisingly high, for a perfect key distribution you have 50% collision probability after only ~16M inserts. For normal, real world, key distributions you have significant collision probability at only few thousand inserts. Unfortunately, there is no work around. Your only solution, if this is truly the problem, is to reduce the size of the batches (the size of the #temp tables) so that the collision probability is reduced. Or deal with deadlocks and retry... which you'll have to do anyway, but at least you can deal with fewer deadlocks.

Remus Rusanu
This insight has been very helpful thus far. I can't accept any answers yet until I come up with a way to reduce these deadlocks to an acceptable level, but I think this is on the right track.Your article links to this one, which is very interesting:http://consultingblogs.emc.com/jamesrowlandjones/archive/2009/05/28/the-curious-case-of-the-dubious-deadlock-and-the-not-so-logical-lock.aspxHe mentions three possible solutions to this issue, of which #1 is1. Change your key to a surrogate integer based keyBut this is exactly what I have. The PK in this table is a unique int, nothing more
skimania
I actually disagree with James on that one. Changing the key to an int surrogate key will do exactly nothing, the 8 bytes int will still be hashed to the lock resource size, which is 6 bytes. Imho. the only reasonable approach is to reduce the batch sizes.
Remus Rusanu
Just ran:select %%lockres%% as LockHas, count(*) from MarketDataCurrentgroup by %%lockres%%having count(*) > 1(during the middle of the day)and got 0 rows!Our table isn't actually all that big. ~32,000 rows total (it's supposed to be very lightweight and only have real-time data)So, with a unique int id pk, and values in the range of up to 250,000 hash key collisions seem almost impossible as an explanation.I'll run this query again right after the next deadlock to see if there are any at that point, but I don't expect them.
skimania
Also, FWIW, the batch sizes are about 200-300 rows on one process and 5000 for the other.
skimania
Other question, do you know how I can resolve the actual IDs of the rows that are deadlocking from the XDL above?
skimania
With such small batches/table sizes, is not very likely to be statistical collision (is still possible though). Are you sure your updates don't overlap? Ie. not hash collision, but true key collision.
Remus Rusanu
`id="lock65dca340"`: to find the rows that deadlocked, you run `SELECT * FROM MKP_RISKDB.dbo.MarketDataCurrent WHERE substring(%%lockres%%, 6,8) = '65dca340'`.
Remus Rusanu
Surprisingly, that yielded no rows. I checked the main table (the one which has the definition of the MDID's which has the same values for the PK and it too yielded no rows. I also checked that the values %%lockres%% returns match for the key across the two tables. I also checked that they are entirely unique in the table. this means NO HASH COLLISIONS.Damn!It mus be page locks, right? What happens if I turn off page locks on the table like I suggested below?
skimania
I've tried my best to ensure that the updates don't overlap by adding `and c.mdid not in (select mdid from MarketData where LiveFeedTicker is not null and PriceSource like 'LiveFeed.%')`to the non live feed insert (which will only insert stuff in that set). I need to find the id's that are colliding - but that %%lockres%% isn't working. Could the hash algo change over time? Also, wouldn't the two conflicted keylock ID's match in the XDL if there was the same ID? The only way I can see a collision with two distinct keys is if they are on the same page...
skimania
+1  A: 

I'd like to answer one question I asked in a comment, which is,

"How do you identify the rows which are locking?".

In the following deadlock XDL, on the two "process" nodes which are locking, there is a waitresource attribute. in this case:

waitresource="KEY: 6:72057594090487808 (d4005c04b35f)

and

waitresource="KEY: 6:72057594090487808 (b00072ea4ffd)

Using the %%lockres%% keyword Remus pointed to,

select %%lockres%%, * from MarketDataCurrent 
   where %%lockres%% in ('(d4005c04b35f)', '(b00072ea4ffd)')

This yielded the two rows which are conflicting. they are indeed unique ids, and there is no collision. I still don't know why I'm getting a deadlock here, but I'm getting closer.

I will note that both of the id's are supposed to only be coming from the LiveFeed program, but again, there is a clause in the update that is supposed to be filtering out this row from actually updating from the other side.

<deadlock-list>
 <deadlock victim="processffffffff8f5872e8">
  <process-list>
   <process id="process8dcb68" taskpriority="0" logused="1256" waitresource="KEY: 6:72057594090487808 (d4005c04b35f)" waittime="1906" ownerId="1349627324" transactionname="user_transaction" lasttranstarted="2010-06-16T16:50:04.727" XDES="0x424e6258" lockMode="U" schedulerid="2" kpid="1004" status="suspended" spid="683" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2010-06-16T16:50:04.727" lastbatchcompleted="2010-06-16T16:50:04.727" clientapp=".Net SqlClient Data Provider" hostname="RISKAPPS_VM" hostpid="2600" loginname="RiskOpt" isolationlevel="read committed (2)" xactid="1349627324" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
    <executionStack>
     <frame procname="MKP_RISKDB.dbo.MarketDataCurrent_BulkUpload" line="28" stmtstart="1062" stmtend="1720" sqlhandle="0x03000600a28e5e4ef4fd8e00849d00000100000000000000">
        UPDATE c WITH (ROWLOCK) SET LastUpdate = getdate(), Value = t.Value, Source = @source 
        FROM MarketDataCurrent c INNER JOIN #MDTUP t ON c.MDID = t.mdid
        WHERE c.lastUpdate &lt; @updateTime
        and   c.mdid not in (select mdid from MarketData where BloombergTicker is not null and PriceSource like &apos;Blbg.%&apos;)
        and   c.value &lt;&gt; t.value     </frame>
             <frame procname="adhoc" line="1" stmtstart="88" sqlhandle="0x01000600c1653d0598706ca7000000000000000000000000">
        exec MarketDataCurrent_BulkUpload @clearBefore, @source     </frame>
     <frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000">unknown</frame>
    </executionStack>
    <inputbuf>(@clearBefore datetime,@source nvarchar(10))exec MarketDataCurrent_BulkUpload @clearBefore, @source</inputbuf>
   </process>
   <process id="processffffffff8f5872e8" taskpriority="0" logused="0" waitresource="KEY: 6:72057594090487808 (b00072ea4ffd)" waittime="1921" ownerId="1349627388" transactionname="user_transaction" lasttranstarted="2010-06-16T16:50:04.757" XDES="0x289ea040" lockMode="U" schedulerid="5" kpid="11192" status="suspended" spid="382" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2010-06-16T16:50:04.757" lastbatchcompleted="2010-06-16T16:50:04.757" clientapp=".Net SqlClient Data Provider" hostname="RISKBBG_VM" hostpid="2452" loginname="RiskOpt" isolationlevel="read committed (2)" xactid="1349627388" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
    <executionStack>
     <frame procname="MKP_RISKDB.dbo.MarketDataCurrentBlbgRtUpload" line="14" stmtstart="840" stmtend="1220" sqlhandle="0x03000600005f9d24c8878f00849d00000100000000000000">
        UPDATE c WITH (ROWLOCK) SET LastUpdate = t.LastUpdate, Value = t.Value, Source = t.Source 
        FROM MarketDataCurrent c INNER JOIN #TEMPTABLE2 t ON c.MDID = t.mdid;
    </frame>
     <frame procname="adhoc" line="1" sqlhandle="0x010006004a58132228bf8d73000000000000000000000000">
        MarketDataCurrentBlbgRtUpload     </frame>
    </executionStack>
    <inputbuf>
        MarketDataCurrentBlbgRtUpload    </inputbuf>
   </process>
  </process-list>
  <resource-list>
   <keylock hobtid="72057594090487808" dbid="6" objectname="MKP_RISKDB.dbo.MarketDataCurrent" indexname="PK_MarketDataCurrent" id="lock409d32c0" mode="U" associatedObjectId="72057594090487808">
    <owner-list>
     <owner id="processffffffff8f5872e8" mode="U"/>
    </owner-list>
    <waiter-list>
     <waiter id="process8dcb68" mode="U" requestType="wait"/>
    </waiter-list>
   </keylock>
   <keylock hobtid="72057594090487808" dbid="6" objectname="MKP_RISKDB.dbo.MarketDataCurrent" indexname="PK_MarketDataCurrent" id="lock706647c0" mode="U" associatedObjectId="72057594090487808">
    <owner-list>
     <owner id="process8dcb68" mode="U"/>
    </owner-list>
    <waiter-list>
     <waiter id="processffffffff8f5872e8" mode="U" requestType="wait"/>
    </waiter-list>
   </keylock>
  </resource-list>
 </deadlock>
</deadlock-list>
skimania