views:

748

answers:

3

I have an ASP.NET 2.0 site that stores a user's ID in session to indicate that they are logged in. In some situations, the user doesn't appear to stay logged in. I've been monitoring traffic in Fiddler, and some details I've found:

  • The problem is 100% repeatable on an older laptop of mine when running IE7 and the project manager's laptop when running IE7. The problem does not ever occur on my current laptop running IE7, or any of these laptops when running FF.
  • The problem occurs only in production--not on development, internal staging, or client staging. Production is the only load balanced environment, but the repeatability noted above makes me question load balancing as a factor.
  • When the page which sets Session("ID") = 1 sends a response back to the client, I can see a "Set-Cookie" header in all cases, which is creating the ASP.Net_Session_Id cookie (and it's HttpOnly).
  • Subsequent requests to the server will send that cookie in the header on machines which are not exhibiting the problem, but not on machines that are, so either the cookie is getting deleted or the "Set-Cookie" header is being ignored.
  • The way logging in works is as follows: a page on www.DomainX.com has an iframe. The source of that iframe is a page on login.DomainY.com. A variety of pages served from login.DomainY.com take the user through the login/register process. The final step of login.DomainY.com is to redirect to a page back on www.DomainX.com, including the user's ID in the querystring. This page on www.DomainX.com typically stores the ID in session, and then runs some JS to redirect the top level document to a new page, thus taking the user out of the iframe. This is a process that has worked for several years, with several values of DomainX.com. The one thing that may be different here is that in this case, the JS is simply destroying the iframe and some containing div's.
  • Another difference I see between scenarios where the problem occurs and where it doesn't is in the Google Analytics cookies. There is a difference when login.DomainY.com/FinalStep.aspx does its redirect to www.DomainX.com/SaveTheID.aspx inside the iframe. When the problem does not occur, the request for SaveTheID.aspx includes a variety of Google Analytics cookies (__utma, __utmz, etc). When the problem does occur, this request does not include all the GA cookies (it's missing __utma, __utmz and __utmb).
  • Production is the only environment where login.DomainY.com runs under SSL, so I thought that may be related. But we temporarily set up our staging copy of login.DomainY.com to use SSL, and that had no effect.

Any ideas what could cause this?

Edit: the production environment have domains of www.DomainX.com and DomainX.com. There is another known issue with the cookies not being set for both of those domains. It's possible this is related, but I won't be able to test until that fix goes to prod.

+2  A: 

You will want to have a look at your session state provider to see if it will work across the two server/instances of the .net application. If they are set for example to inProc then you most definitely will run into this problem as each session will be tied to the thread on which it was created. Instead you want to abstract this to either the asp.net state service which both machines can access or even better you should use a distributed caching solution such as the Microsoft Velocity project which will distribute the sessions across the two machines in case one goes down.

Some other ways to deal with this are to use sticky sessions on the load balancer (not recommended) or move to a cookie-less session which would work but may cause some headaches in your code.

In our business we have a primary and secondary server with a distributed cache behind that so that if one machine goes down the other can take over. This same principle applies to load balancing and once you start having more than one machine or even more than one instance of in the application pool you will have to code for this.

If you use Velocity for session be sure to make sure that the cache that you select for storing your sessions is non-evictable.

Middletone
Could you explain why using sticky sessions on the load balancer is not recommended? I always thought that was a *good* idea.
Jacob
Sticky sessions cause the request to go to the same server all the time. This means that if a server goes down then the request doesn’t get re-routed. If it does get re-routed the session will be lost unless the program has built in redundancy for session state and the like.Non-sticky sessions force you to build redundancy into your app and design it to scale out as well as support the failure of a node. Lastly non-sticky sessions let your router manage traffic more optimally. Requests are distributed more evenly amongst the nodes as it doesn’t matter which box fields the request.
Middletone
A: 

I think Middletone is right except: it does not explain why the problem can't be repro'd with Firefox. The load balancer is the no.1 suspect; it'd be good to see if it's got an alibi by taking all but one of the app servers offline (if feasible, and during a time when it can handle the load) and see if the problem still exists. If so, it's not the load balancer and you can start looking somewhere else. If not, it's the load balancer.

BTW: Sticky sessions are bad because the sessions that are on it are not protected by redundancy. In addition, the load balancer cannot distribute to the least-loaded server at a certain point in time, it can only decide at the start of a session and then keep the user where he/she is.

If it turns out you've got a load balancer problem here, the first thing I'd do is turn on session stickyness, and then maybe look for another solution with the soothing backgroung of a working production environment.

GreenIcicle
A: 

Shoot, I must have lost my cookie and ability to reply and edit...

I did explore the load balancer as the issue a little bit by modifying my hosts file to point directly to the IPs of each of the web servers, and that didn't have any effect. I think the client's IT staff is going to balk at asking to shut off load balancing.

We do have a separate state server running, and it's the same one that's been used for a few years on other sites hosted by the same servers. Not necessarily without problems, but without a problem like this.

As a band-aid, I'm currently testing other persistance mechanisms...

Joel