views:

470

answers:

2

I am experiencing some strange behaviour in my Java server application whereby database operations that usually take a few milliseconds are sporadically taking much longer (30s - 170s) to complete. This isn't isolated to a specific query as I've seen the delays occurring for both SQL update and select statements. Also, all of my select statements use the NOLOCK option so I've ruled out possible lock contention.

The last time I saw a delay I managed to capture the following stack trace from JConsole; the update in question typically takes 5ms to complete but this stack trace was accessible for at least 10 - 20 seconds. The trace suggests to me that the statement has been executed but there is some delay in retrieving the result although I could be wrong? Obviously as this was an update statement the only result I'd expect would be the row count (i.e. not a large result set of data).

I saw a "transport level error" in SQL Server Management Studio at around the time of the delay.

One suggestion I've had is that these problems are due to SQL Server resources being exhausted. Has anyone seen anything similar? Can anyone shed any light on this problem?

Thanks in advance.

Stack Trace:

Name: MessageRouterImplThread-2
State: RUNNABLE
Total blocked: 0  Total waited: 224

Stack trace: 
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.read(SocketInputStream.java:129)
com.microsoft.util.UtilSocketDataProvider.getArrayOfBytes(Unknown Source)
com.microsoft.util.UtilBufferedDataProvider.cacheNextBlock(Unknown Source)
com.microsoft.util.UtilBufferedDataProvider.getArrayOfBytes(Unknown Source)
com.microsoft.jdbc.sqlserver.SQLServerDepacketizingDataProvider.signalStartOfPacket(Unknown Source)
com.microsoft.util.UtilDepacketizingDataProvider.getByte(Unknown Source)
com.microsoft.util.UtilByteOrderedDataReader.readInt8(Unknown Source)
com.microsoft.jdbc.sqlserver.tds.TDSRequest.getTokenType(Unknown Source)
com.microsoft.jdbc.sqlserver.tds.TDSRequest.processReply(Unknown Source)
com.microsoft.jdbc.sqlserver.SQLServerImplStatement.getNextResultType(Unknown Source)
com.microsoft.jdbc.base.BaseStatement.commonTransitionToState(Unknown Source)
com.microsoft.jdbc.base.BaseStatement.postImplExecute(Unknown Source)
com.microsoft.jdbc.base.BasePreparedStatement.postImplExecute(Unknown Source)
com.microsoft.jdbc.base.BaseStatement.commonExecute(Unknown Source)
com.microsoft.jdbc.base.BaseStatement.executeUpdateInternal(Unknown Source)
com.microsoft.jdbc.base.BasePreparedStatement.executeUpdate(Unknown Source)
   - locked com.microsoft.jdbc.sqlserver.SQLServerConnection@c4b83f
org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:101)
org.springframework.jdbc.core.JdbcTemplate$2.doInPreparedStatement(JdbcTemplate.java:798)
org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:591)
org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:792)
org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:850)
org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:858)
org.springframework.jdbc.core.simple.SimpleJdbcTemplate.update(SimpleJdbcTemplate.java:237)
+1  A: 

"...whereby database operations that usually take a few milliseconds are sporadically taking much longer (30s - 170s) to complete."

What you are describing sounds like an incorrectly cached query plan, due to out of date statistics (and/or indexes that need rebuilding), or inorrect parameter sniffing. The timeout could be occuring because the server is taking longer than the default connection timeout.

I would talk to your DBA and first get statistics updated, and if that doesn't work get the indexes of the tables involved in the query rebuilt.

Run this on your Database (with the usual caveat about not runninhg in Production without talking to your admin/DBA, and run at own risk etc.):

EXEC sp_updatestats

EXEC sp_refreshview 

EXEC sp_msForEachTable 'EXEC sp_recompile ''?'''

Alternatively, you mention time of day being a factor. Could it be that a backup or scheduled job is occuring at that time?

Update: You could kick off a profiler trace: MS SQL Server 2008 - How Can I Log and Find the Most Expensive Queries? but don't restrict to your DB. Such a trace, as long as it is started from SSMS as per that post, is relatively low impact (3-5% ish).

Mitch Wheat
Mitch, Currently the tables are so small that even an incorrect query (i.e. table scan) would be negligable. So I'm more inclined to think that a backup job (or something else) may be causing a drain on server resources. Unfortunately it's a shared production server so any culprit is going to be difficult to track down.
Adamski
@Adamski: perhaps you could run a trace. I'll updated my answer with details...
Mitch Wheat
+1  A: 

The "transport level error" seems to indicate connectivity problems. Is the database on a separate machine?

Jeroen van Bergen
" Is the database on a seperate machine?" - they usually are! :)
Mitch Wheat
Yes it is on a separate machine (although my dev one isn't!). In the past a "transport level error" in Management Studio that won't go away (i.e. if I rerun the query from the IDE) has been indicative of a network problem. However, this error goes away when I rerun the same query so I'm wondering if this points more towards a resource issue.
Adamski