views:

23

answers:

1

We have been receiving reports of the following server error periodically from users.

[OutOfMemoryException: Exception of type System.OutOfMemoryException was thrown.]
[HttpException (0x80004005): Unable to serialize the session state. Please note that non-serializable objects or MarshalByRef objects are not permitted when session state mode is ‘StateServer’ or ‘SQLServer’

Once in a state where this error appears, it appears to be hit or miss whether the errors are reproducible locally. If they are, then we can usually reproduce them for a couple minutes, but not on every page hit. This usually tapers off on its own and usually has resolved itself by the time we get back in contact with the users.

The Web Service has around 90-100 active connections during business hours. The only other site on this server is the staging version of this site, which gets hit very infrequently. The Session State is stored on the same SQLServer instance as the application database which is housed on a fairly large cluster of virtual machines. Neither the Web Server or the SQLServer seemed to be taxed (either processor or memory-wise) while this is going on.

The distribution of which pages are erroring seems to be comparable to the normal distribution for each page. There doesn't appear to be any pattern in terms of times of occurrence. We do have less errors on average on weekends (which correlates to normal site load), but even this appears to not be consistent.

There also doesn't appear to be a correlation between the errors logged and any kind of logged performance monitor events. This includes an array of perfmon counters including:

.NET CLR Jit(w3wp)\notal # of IL Bytes Jitted  
.NET CLR Jit(w3wp)\IL Bytes Jitted / sec  
.NET CLR Jit(w3wp)\% Time in Jit  
.NET CLR Jit(w3wp)\# of Methods Jitted  
.NET CLR Jit(w3wp)\# of IL Bytes Jitted  
ASP.NET Apps v1.1.4322(__Total__)\Requests Failed  
ASP.NET Apps v1.1.4322(__Total__)\Errors Unhandled During Execution/Sec  
ASP.NET Apps v1.1.4322(__Total__)\Errors Unhandled During Execution  
ASP.NET Apps v1.1.4322(__Total__)\Cache Total Turnover Rate  
ASP.NET Apps v1.1.4322(__Total__)\Errors During Preprocessing  
ASP.NET Apps v1.1.4322(__Total__)\Errors During Execution  
ASP.NET Apps v1.1.4322(__Total__)\Requests Executing  
ASP.NET Apps v1.1.4322(__Total__)\Requests Total  
ASP.NET Apps v1.1.4322(__Total__)\Errors Total  
ASP.NET Apps v1.1.4322(__Total__)\Sessions Abandoned  
ASP.NET Apps v1.1.4322(__Total__)\Errors Total/Sec  
ASP.NET Apps v1.1.4322(__Total__)\Anonymous Requests/Sec  
ASP.NET Apps v1.1.4322(__Total__)\Requests/Sec  
ASP.NET Apps v1.1.4322(__Total__)\Session SQL Server connections total  
ASP.NET Apps v1.1.4322(__Total__)\Cache Total Hit Ratio  
ASP.NET v1.1.4322\Requests Current  
ASP.NET v1.1.4322\Request Execution Time  
Memory\Pages/sec  
Bytes Total/sec  
PhysicalDisk(_Total)\Avg. Disk Queue Length  
Processor(_Total)\% Processor Time  
Web Service Cache\File Cache Hits %  
Web Service Cache\File Cache Misses  
Web Service Cache\File Cache Hits  
Web Service(_Total)\Current Connections  
Web Service(_Total)\Post Requests/sec)

The only pattern I can see in the logs doesn't correlate to the occurrence of these errors, but is the only pattern I can see. Looking at the perfmon logs we are seeing a pattern where the "Total # of IL Bytes Jitted", "IL Bytes Jitted / sec", "% Time in Jit", "# of Methods Jitted", and "# of IL Bytes Jitted" counters for the staging site (which shouldn't be getting any traffic) doesn't pull data for a 20-50 minute period after which there is an immediate spike in "IL Bytes Jitted / sec" and a jump in "% Time in Jit" for 2-20 minute of up to 99% for the main site.

If anyone has any ideas on what could be causing this, or has had experience with a similar issue I would be grateful for any input.

Thanks!

A: 

This is a wild stab, since I had a similar problem recently (not exactly the same).

Are you using the /3GB flag at startup for your server?

Even if you're not, you may take a look at the Free System Page Table Entries via perfmon (under Memory). You should have in access of 15K. Anything under 5-10K is "bad", and can result in OOM exceptions when saving to session.

http://blogs.technet.com/b/clint_huffman/archive/2008/04/07/free-system-page-table-entries-ptes.aspx

Phil Sandler
Thanks Phil,The server does not have the /3GB flag set and it looks like the Web Server is sitting at 180701 Free System Page Table Entries.I'll add this counter to my logs and see if we see a drop in this when the error is occurring. (BTW, how are you adding line breaks?)
Daniel
It looks like the Free System Page Table Entries is sitting pretty close to 180k (it maintained between 180k and 181k for the remainder of Friday and all weekend).
Daniel