ansaurus

Question

Help troubleshooting SqlException: Timeout expired on connection, in a non-load situation

Answer 1

+1 A:

I would compare the timestamp of the timeout with the execution time of your nightly backup. If they coincide, you could set your RSS feed to be static for that time.

Another thing to try (even though it isn't exactly an answer) is to immediately run sp_who when you get a timeout exception. It won't catch everything (the offending process could be done by the time you run this) but you may get lucky.

You can also fire up SQL Profiler when you head home for the night and step through the activity the next morning if you see the error again. Just be sure to not run it from the server itself (I'm pretty sure it reminds you of this when it starts).

EDIT: Addressing your update.

Is EF updating/creating its cache? It could explain the abundance of queries at one time and why no queries had database hits later.

Other than that, it appears you have a heisenbug. The only thing I can think for you to add is a lot more logging (to a file or the event log).

Austin Salonen 2009-09-14 14:43:43

Thanks for the idea, but it doesn't seem to be that. Our backup runs at 8 AM, and none of these errors happened around that time. Also, our database is REALLY small (.bak file is 2Mb), so I doubt this could take a lot of time to run...

Daniel Magliola 2009-09-14 15:36:18

Is it your System Backups that run at 8 AM or your SQL Server Backups? These are normally different tools and/or processes that run at different times (though some backup products can allow you to synchonize the tow, they are still different steps at different times).

RBarryYoung 2009-09-29 13:09:01

"tow" should be "two" (can't tyep)..

RBarryYoung 2009-09-29 13:09:44

Answer 2

A:

It smells a cronned thing that runs at the same time. As RBarryYoung says.. some nightly backup or it could be something else Do you have root access to the server? Can you see the crontabs?

Could it be some full text indexing plugin on top of the SQL server that runs its reindexing procedures close to the time you are experiencing the issues?

trandism 2009-09-29 15:21:03

Not really, over time, it's started to happen more and more often, at different times of the day.I do have root access to the server. It's a Windows one, so no "crontabs", but "Scheduled Tasks" has nothing in it.

Daniel Magliola 2009-09-29 17:00:38

Answer 3

+3 A:

Not Enough Memory

This is very likely a Memory problem, perhaps aggravated or triggered by other things, but still inherently a memory problem. there are two other (less likely) possibilities, that you should check and eliminate first (because it is easy to do so):

Easy To Check Possibilities:

You may have "Auto Close" enabled: Auto Close can have exactly this behavior, however it is rare for it to be turned on. To check this, in SSMS right-click on your application database, select "Properties", and then select the "Options" pane. Look at the "Auto Close" entry and make sure that it is set to False. Check tempdb also.
SQL Agent Jobs may be causing it: Check the Agent's History Log to see if there were any jobs consistently running during the events. Remember to check maintenance jobs too, as things like Rebuilding Indexes are frequently cited as performance problems while they are running. These are unlikely candidates now, only because they would not normally be affected by the Profiler.

Why It Looks Like a Memory Problem:

If those do not show anything, then you should check for memory problems. I suspect Memory as the cause in your case because:

You have 1 GB of Memory: Although this is technically above the Minimum for SQL Server, it is way below the recommended for SQL Server, and way below what in my experience is acceptable for production, even for a lightly loaded server.
You are running IIS and SQL Server on the same box: This is not recommended by itself, in large part because of the contention for memory that results, but with only 1 GB of memory it results in IIS, the app, SQL Server, the OS and any other tasks and/or maintenance all fighting for very little memory. The way the Windows manages this is to give memory to the active processes by aggressively taking it away from the non-active processes. It can take many seconds, or even minutes for a large process like SQL Server to get back enough of its memory to be able to completely service a request in this situation.
Profiler made 90% of the problem go away: This is a big clue that memory is likely the problem, because typically, things like Profiler have exactly this effect on this particular problem: the Profiler task keeps the SQL Server just a little bit active all of the time. Frequently, this is just enough activity to either keep it off the OS's "scavenger" list, or at least reduces it's impact somewhat.

How to Check For Memory as the Culprit:

Turn Off the Profiler: Its having a Heisenberg effect on the problem, so you have to turn it off or you will not be able to see the problem reliably.
Run a System Monitor (perfmon.exe) from another box, that remotely connects to the perfomrance collection service on the box that your SQL Server and IIS are running on. you can most easily do this by first removing the three default stats (they are local only), and then add in the needed stats (below), but make sure to change the Computer name in the first drop-down to connect to your SQL box.
Send the collected data to a file by creating a "Counter Log" on perfmon. If you are unfamiliar with this, then the easiest thing to do is probably to collect the data to a tab or comma separated file that you can open with Excel to analyze.
Set up your perfmon to collect to a file and add the following counters to it:

-- Processor\%Processor Time[Total]

-- PhysicalDisk\% Idle Time[for each disk]

-- PhysicalDisk\Avg. Disk Queue Length[for each disk]

-- Memory\Pages/sec

-- Memory\Page Reads/sec

-- Memory\Available MBytes

-- Network Interface\Bytes Total/sec[for each interface in use]

-- Process\% Processor Time[see below]

-- Process\Page Faults/sec[see below]

-- Process\Working Set [see below]
For the Process counters (above) you want to include the sqlserver.exe process, any IIS processes, and any stable application processes. Note that this will ONLY work for "stable" processes. Processes that are continually being re-created as needed, cannot be captured this way because there is no way to specify them before they exist.
Run this collection to a file during the time that the problem most frequently happens. Set the collection interval to something close to 10-15 secs. (this collects a lot of data, but you will need this resolution to pick out the separate events).
After you have one or more incidents, stop the collection and then open your colleced data file with Excel. You will probably have to reformat the timestamp column to be usefully visible and show hours minutes and seconds. Use your IIS log to find the exact time of the incidents, then look at the perfmon data to see what was going on before and after the incident. In particular you want to see if its working set was small before and was large after, with a lot of page faulting in between. That's the clearest sign of this problem.

SOLUTIONS:

Either separate IIS and SQL Server onto two different boxes (preferred) or else add more memory to the box. I would think that 3-4 GB should be a minimum.

What About That Weird EF Stuff?

The problem here is that it is most likely either peripheral or only contributory to your main problem. Remember that Profiler made 90% of your incidents go away, so what remains, may be a different problem, or it may be only the most extreme aggravator of the problem. Because of its behavior I would guess that it is either cycling its cache or there is some other background maintenance of the application server processes.

RBarryYoung 2009-09-29 16:21:28

Thanks for your comprehensive answer!Let's see...Auto-close is off. The only agent job is the backup, and we've all of our errors have been outside the backup hours.Memory: The system reports memory usage is 807Mb, although what you say DOES make a lot of sense, and it correlates with something else we're seeing. I just set up a cron in a different server to make a web request every minute, and I haven't had any errors since then... Not enough days have passed for my taste, but it looks promising... This supports the theory that the SQL Server is being sent off to disk.

Daniel Magliola 2009-09-29 17:13:20

As for the weird EF stuff... It ended up being a second website in that server, with a even lower load than the one website where i'm having this problem, that is very badly coded, and ends up throwing hundreds of queries to the DB for each page load.We have disabled that app, and the frequency of error reports seems to have decreased, but we still have them.The fact that that memory is not taking up any more memory is probably the reason for the decrease, I guess

Daniel Magliola 2009-09-29 17:17:15

What really frustrates me is that I have a W2003 box with only 1gb of ram running MANY web apps, plus sql server, etc, and it's worked fine for years (and it really deserves an upgrade), while this box has only this very light load, very small website, with a tiny database, and it's apparently dying because of it...If this is really the reason for the problem, then W2008 and SQL2008 are a HUGE memory hog, compared to older versions...

Daniel Magliola 2009-09-29 17:18:52

RBarryYoung 2009-09-29 22:06:26

We're running into this same issue and memory *shouldn't* be a problem...we have 40 gigs on the SQL Server, and IIS is on a different box. Was this resolved in any way?

kamens 2010-03-24 15:41:38

RBarryYoung 2010-03-26 03:00:48

ansaurus

tags:

views:

answers: