views:

72

answers:

2

This is a baffling one. My ASP.NET 3.5 app that was working fine suddenly started getting timeout errors...

System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding

...but only for requests with the correct username/password in the connection string. If we deliberately messed up the connection string in the config, SQL Server would properly reject the request, but instantly, not after a 15 second wait (configured timeout), clearly indicating it had no problem communicating with the server.

We bounced the SQL Server box (SQL Server 2005, fully patched) and the IIS box, touched the web.config to force app restart, etc. but no luck. All requests would hang for 15 seconds then report that error. At all times we were able to reach the database server directly (management studio, monitoring tools), AND I was able to configure my local copy of the site inside Visual Studio to hit this same database server without incident. And this issue that came on suddenly by itself and lasted most of the day (errors started getting logged at 6:17am) suddenly resolved itself around 4:30pm.

It seemed to be acting as if there was a network routing issue between this one web server and this one database server, but only with a particular set of SQL credentials. I know this doesn't make any sense, but no does any other scenario we can imagine. I'm a very experienced developer and neither I nor our very experienced DBA nor sysadmins can find anything in event logs, monitoring irregularities, etc. that could give a sensible explanation to this bizarre and self-resolving set of symptoms.

This happened on our client's test environment, which was not ideal, but because we don't understand what the hell happened our concern is that this may suddenly pop up in production and we'll be beating our collective heads against the wall, so any ideas or wild theories are very much welcome.

The LINQ-to-SQL generated line of code that's blowing up is ExecuteMethodCall:

[Function(Name="dbo.spSetModelingNodeState")]
public int spSetModelingNodeState([Parameter(Name="NodeIdentifier", DbType="VarChar(60)")] string nodeIdentifier, [Parameter(Name="NodeStatus", DbType="Int")] System.Nullable<int> nodeStatus, [Parameter(Name="PoolWeighting", DbType="Float")] System.Nullable<double> poolWeighting)
{
    IExecuteResult result = this.ExecuteMethodCall(this, ((MethodInfo)(MethodInfo.GetCurrentMethod())), nodeIdentifier, nodeStatus, poolWeighting);
    return ((int)(result.ReturnValue));
}

and my code that calls that code is (abbreviated):

using (MyDataContext myDataContext = new MyDataContext(_connectionString))
{
    myDataContext.spSetModelingNodeState(
        Adapter.Identifier, // string
        (int)newHealthValue, // enum, cast to int
        PoolWeighting); // float
}

EDIT: Took a couple days to track down, but I've added log/Profiler info below, per commentor requests, and the body of the stored procedure in question

The SQL Server logs simply show logins and logouts and no indication of problem. The Profiler trace also has no smoking gun, but the following seems to show a statement taking 18 seconds. Follow spid 82. It begins at 4:37:50, does some things, then spid 83 records a login at 4:38:07. Spid 82 records its completion next, with the same 4:38:07 start time, but because these are logged sequentially, it must have actually occurred somewhere between 4:38:07 (previous trace statement) and 4:40:10 (following statement).

Spid   Starttime       Event               Query
82     4:37:50 PM      Login               -- network protocol: LPC  set quoted_identifier on  set arithabort off  set numeric_roundabort off  set ansi_warnings on  set ansi_padding on  set ansi_nulls on  set concat_null_yields_null on  set cursor_close_on_commit off  set implicit_transactions off
82     4:37:50 PM      SP:StmtStarting     EXEC @RETURN_VALUE = [dbo].[spSetModelingNodeState] @NodeIdentifier = @p0, @NodeStatus = @p1, @PoolWeighting = @p2
82     4:37:50 PM      SP:Starting         EXEC @RETURN_VALUE = [dbo].[spSetModelingNodeState] @NodeIdentifier = @p0, @NodeStatus = @p1, @PoolWeighting = @p2
82     4:37:50 PM      SP:StmtStarting     UPDATE    ModelingNodeState   SET    NodeStatus = @NodeStatus,    PoolWeighting = ISNULL(@PoolWeighting, PoolWeighting)   WHERE    NodeIdentifier = @NodeIdentifier     
83     4:38:07 PM      Login               -- network protocol: LPC  set quoted_identifier on  set arithabort off  set numeric_roundabort off  set ansi_warnings on  set ansi_padding on  set ansi_nulls on  set concat_null_yields_null on  set cursor_close_on_commit off  set implicit_transactions off
82     4:37:50 PM      SP:Completed        EXEC @RETURN_VALUE = [dbo].[spSetModelingNodeState] @NodeIdentifier = @p0, @NodeStatus = @p1, @PoolWeighting = @p2
80     4:40:10 PM      SP:StmtStarting     SELECT 'Server[@Name=' + quotename(CAST(serverproperty(N'Servername') AS sysname),'''') + ']' + '/Database[@Name=' + quotename(db_name(),'''') + ']' + '/Table[@Name=' + quotename(tbl.name,'''') + ' and @Schema=' + quotename(SCHEMA_NAME(tbl.schema_id),'''') + ']' AS [Urn], tbl.name AS [Name], SCHEMA_NAME(tbl.schema_id) AS [Schema], CAST(   case       when tbl.is_ms_shipped = 1 then 1      when (          select               major_id           from               sys.extended_properties           where               major_id = tbl.object_id and               minor_id = 0 and               class = 1 and               name = N'microsoft_database_tools_support')           is not null then 1      else 0  end                         AS bit) AS [IsSystemObject], tbl.create_date AS [CreateDate], stbl.name AS [Owner] FROM sys.tables AS tbl INNER JOIN sys.database_principals AS stbl ON stbl.principal_id = ISNULL(tbl.principal_id, (OBJECTPROPERTY(tbl.object_id, 'OwnerId'))) WHERE (CAST(   case       when tbl.is_ms_shipped = 1 then 1      when (          select               major_id           from               sys.extended_properties           where               major_id = tbl.object_id and               minor_id = 0 and               class = 1 and               name = N'microsoft_database_tools_support')           is not null then 1      else 0  end                         AS bit)=@_msparam_0) ORDER BY [Schema] ASC,[Name] ASC
80     4:40:10 PM      SQL:BatchStarting   use [master]
62     4:20:10 PM      Logout              NULL
55     4:20:02 PM      Logout              NULL
74     4:13:37 PM      Logout              NULL
59     4:20:10 PM      Logout              NULL
55     4:40:29 PM      Login               -- network protocol: TCP/IP  set quoted_identifier on  set arithabort off  set numeric_roundabort off  set ansi_warnings on  set ansi_padding on  set ansi_nulls on  set concat_null_yields_null on  set cursor_close_on_commit off  set implicit_transactions 
55     4:40:29 PM      SQL:BatchStarting   set transaction isolation level  read committed  set implicit_transactions off 
82     4:37:50 PM      Logout              NULL

This is definitely a deeper level of SQL debugging than I'm accustomed to, so please let me know if I'm misreading this, but it certainly appears like this statement was running without incident, just incredibly slowly. And again, it was running fine beforehand, started timing out, then resumed normal operation.

The only thing I can think of now that I've ruminated in public is that if there were some long-running locks jamming up the tables called by the sproc, that might explain it. Because LINQ-to-SQL somewhat hides the login/logout process, it's possible that the login was always working fine, and it's just this one sproc call that timed out due to a block. Whether those tables were locked, and if so, why, is impossible to say at this point. Does that sound like the most likely explanation for what I saw, or does anyone have another theory?

For completeness, here's the body of the sproc:

/****** Object:  StoredProcedure [dbo].[spSetModelingNodeState]    Script Date: 10/29/2010 14:37:46 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[spSetModelingNodeState]
(
@NodeIdentifier varchar(60),
@NodeStatus int,
@PoolWeighting float = NULL
)
AS
/*try to update existing row to new state*/
UPDATE
    ModelingNodeState
SET
    NodeStatus = @NodeStatus,
    PoolWeighting = ISNULL(@PoolWeighting, PoolWeighting)
WHERE
    NodeIdentifier = @NodeIdentifier

IF @@ROWCOUNT = 0
    /*not found, so insert new one*/
    INSERT
        ModelingNodeState(
            NodeIdentifier,
            PoolWeighting,
            NodeStatus,
            LastModelingResult)
    VALUES(
        @NodeIdentifier,
        ISNULL(@PoolWeighting, 1),
        0,
        NULL)

DECLARE @timestamp datetime
SET @timestamp = CURRENT_TIMESTAMP

/*fill endtime of previous node state*/
UPDATE 
    ModelingNodeStateLog
SET
    EndTime = @timestamp
WHERE
    EndTime IS NULL AND
    NodeIdentifier = @NodeIdentifier AND
    NodeStatus <> @NodeStatus

/*start a new entry in the log (yes, I saw I should remove the IF check and always insert, but that's how it currently is in the db)*/
IF @@ROWCOUNT <> 0
    INSERT
        ModelingNodeStateLog
        (
        NodeIdentifier,
        NodeStatus,
        StartTime
        )
    VALUES
        (
        @NodeIdentifier,
        @NodeStatus,
        @timestamp
        )

RETURN
A: 

I have seen this behavior after restoring a database backup from a different computer.

This can be fixed by a SQL command in management studio. See How to fix orphaned SQL Server users

Wolfgang
Thanks, but doesn't seem like this can be it. The login was working fine, then it wasn't for most of the day, then it was again. No restores were run.
Joel P.