I’ve had an open ticket with Microsoft on a timeout issue we are having for over a month now with no resolution. I’m throwing the problem out to see if anybody else is experiencing this and any suggested work-arounds.
SS2005, SP3 database running on WinSrvr2003 cluster with NETAPP SAN.
Problem is that we are experiencing dotnet application timeouts when doing a NetAPP SnapManager full database snapshot backup. The NetApp SnapManager usings the standard VSS API to get a snapshot image. The timeouts are intermittent in nature and affect several of our dotnet applications including web services and our internal document processing (batch mode). More than 95% of the application timeouts occur during the “prepare” phase of the snapshot backup just prior to freezing the database. The timeouts do not occur during the freeze phase, but just prior to that as the VSS method is trying to get the database in a consistent state.
We do about 20 full database snapshot backups a day for a rolling ETL process to update our data warehouse. The primary advantage of the snapshot backup is that we had a “consistent” database and do not have to worry about transit data modifications during the data extract of the ETL phase.
The timeout stack traces indicate both connection timeouts and sql timeouts. The sql timeouts are from very simplistic stored procedure calls that should never time out.
Both my boss and I think that there is some sort of blocking process (that I cannot capture in a trace) from the VSS method that is causing the problem.
I’ve listed a typical combined portion of the NETApp SnapManager log and our application systemlog below to illustrate the sequence of events:
[14:07:46.176] Starting VDI backup thread for database #1: [EXCL1VS1\NGSCORE : NgsCore]... [14:07:46.179] Sending SQL command to instance EXCL1VS1\NGSCORE for database NgsCore : device NTAPDEV__NGSCORE_NgsCore... [14:07:46.179] Receiving by server [EXCL1VS1\NGSCORE]... [14:07:46.180] BACKUP DATABASE [NgsCore] TO VIRTUAL_DEVICE='NTAPDEV__NGSCORE_NgsCore' WITH SNAPSHOT [14:07:46.180] Waiting for SQLServer to respond... [14:07:46.189] Performing data transfer of [EXCL1VS1\NGSCORE : NgsCore]... From application systemlog: 2010-07-21 14:08:13.4 Stack trace: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error) at System.Data.SqlClient.TdsParserStateObject.ReadSni(DbAsyncResult asyncResult, TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket() at System.Data.SqlClient.TdsParserStateObject.ReadBuffer() at System.Data.SqlClient.TdsParserStateObject.ReadByte() at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest) at System.Data.SqlClient.SqlInternalConnectionTds.PropagateTransactionCookie(Byte[] cookie) at System.Data.SqlClient.SqlInternalConnection.EnlistNull() at System.Data.SqlClient.SqlInternalConnection.Enlist(Transaction tx) at System.Data.SqlClient.SqlInternalConnectionTds.Activate(Transaction transaction) at System.Data.ProviderBase.DbConnectionInternal.ActivateConnection(Transaction transaction) at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject) at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection) at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory) at System.Data.SqlClient.SqlConnection.Open() at Newgistics.Common.Persistence.DalCommon.SetOpenConnection(SqlCommand Command) at Newgistics.Entity.DalEntity.ReadReferencesByURN(String ClassName, StringDictionary URNs, DataSet References) at Newgistics.Entity.Entity.GetByURN[BusinessKeyType](Type ClassType, StringDictionary URNs, ReferenceList EntityReferences) at Newgistics.Entity.Org.GetByURN(StringDictionary URNs, ReferenceList EntityReferences) at Newgistics.Entity.Newgistics..ctor() --- End of inner exception stack trace --- at System.RuntimeTypeHandle.CreateInstance(RuntimeType type, Boolean publicOnly, Boolean noCheck, Boolean& canBeCached, RuntimeMethodHandle& ctor, Boolean& bNeedSecurityCheck) at System.RuntimeType.CreateInstanceSlow(Boolean publicOnly, Boolean fillCache) at System.RuntimeType.CreateInstanceImpl(Boolean publicOnly, Boolean skipVisibilityChecks, Boolean fillCache) at System.Activator.CreateInstance(Type type, Boolean nonPublic) at Newgistics.Application.Services.ServiceCoordinator.GetProvider(Service service, Org org) [14:08:18.540] SQL Server is prepared to freeze the database now [EXCL1VS1\NGSCORE : NgsCore]...
[14:08:18.692] Ready to prepare to create snapshot...
[14:08:18.693]
[14:08:18.693] SNAPSHOT DATABASE
In almost every case where we have an application timeout (dotnet default is 30 seconds) it occurs between the start of the “Waiting for SQL Server to respond…” step and the “SQL Server is prepared to freeze the database now” step. Note that the time between the 2 events is just over 32 seconds.
So I am wondering if anybody else has seen the problem and second if anybody has any ideas about a work-around. My company is not prepared to reduce the number of backups needed during the day.