views:

59

answers:

3

Currently our DNS routes the user to the correct datacenter and then we have a round-robin situation for the servers. We currently store the session information in the cookie but it's grown too large so we want to move it out of the browser and into a database. I'm worried that if we create a midteir box that they all hit that the response times will be affected. It's not feasible to store the session info all all machines because we're talking about 200M+ unique sessions a month. Any suggestions, thoughts?

A: 

This sounds like a problem for Server Fault.

OlduvaiHand
nah, it's a development issue. He needs to change his architecture.
Steve Claridge
+2  A: 

A job for memcached or, if you want to save session data to disk, memcacheddb

Memached is a free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

Memcached is simple yet powerful. Its simple design promotes quick deployment, ease of development, and solves many problems facing large data caches. Its API is available for most popular languages.

Steve Claridge
It's not temporary session data, it's permanent. Think how Google stores your user information when you log in and how it's available everywhere you go on the network. We've used memecacheddb in the past but with 200M keys the berkeley system started to break down.
Ryan Detzel
We had a similar situation where we outgrew our one central session store. Our solution was add another physical box (and storage app) to add a second independent store. We only keep session data on one of the boxes and to determine which box the data is on we do ( sessionId % 2 ) to discover where to read/write data.If we want to add a third box it's a simple case of doing (sessionId % 3) instead.
Steve Claridge
+1  A: 

Let's understand the role of browser-based cookies

  • Cookies are stored per browser profile.
  • The same user logged on from different computers or browsers is considered different users.
  • State cookies are mixed with user cookies

Segregate the cookies.

  • Long-term state cookies, e.g. the currently-remembered userId.
  • session state cookies
  • user cookies

Reading that your site is only beginning to consider server-side cookies implies that a segregation of cookies has not yet been done. User cookies should be stored on server as much as possible, so that when a user logs on at another computer or browser, the preferences and shopping carts are preserved. Your development team has to decide for some cookies, for example shopping carts, to be between being session-state or user info cookies.

User cookies Need to be accessible across the web site, regardless where the user logs in. Your developers have to decide, when a user updates a preference or shopping cart, how immediate should that change be visible if the same userId is logged in at another location.

Which means you have to implement a distributed database system. You have a master db server. Let us say you have 20 web servers, each server with its own database.

Store only frequently changed cookies on the local db and leave the infrequently changing cookies on the master.

Everytime a cookie is updated at a local db, a updated flag is queued for update to the master. The cookie record in the master is not updated, only marked as stale with the location number where the fresh data resides. So that if that userid somehow gets activated 3000 miles away simultaneously, that session would find out the stale records and trigger a request to copy from those records from the fresh location to its own local db and the master db and the records no longer marked as stale on the master db.

Then you schedule a regular sync of most frequently used cookies. The frequency of sync could be nightly or depends on the result of characterization of cookie modification.

First, your programmers would need to write a routine to log all cookie read/writes. You should collect a week's worth of cookie read/write activity to perform your initial component analysis.

You perform simple statistical characterization per cookie, userid and frequency of change. Then you slide along your preferences deciding which cookie is pushed to all the local dbs and which stays on the master. The decision balances between the size of the cookie block on the local dbs and the frequency of database sync you are willing to allow. Which means not every user have the same set of cookies propagated. of course, your programmers would need to write routines to automate the regular recharacterization. Rather than per user, you might wish to lighten the processing load of cookie propagation by grouping your users using cluster analysis. May be the grouping of users for your site is so obvious that you need not perform cluster analysis.

You might be surprised to find that most of the cookies could fall into the longer-than-weekly-update bucket. Or the worse case, daily-update. and the worst case you should accept is hourly update for cookie fields which are not pushed onto the local dbs. You want to increase the chances that a cookie access occurs on the local db rather than being pulled from the master database. So when a user decides to click on "preferences" which is seldom changed, you preemptively pull the preferences records from the master while distracting the user with some frills like "have you considered preview our new service?", "would you like to answer our usability survey?", "new Gibson rant, would you comment?", etc until the "preferences" cookies are copied over.

The characterization of cookies could be done per userid, or per cluster of users to decide which cookie field to push around to local dbs.

It is more simplistic to characterize per userid because it barely involves any statistical analysis skills on the part of the programmer. The disadvantage is that the web server would have to perform decisions for each of 200 million users. The database cookie table would be

Cookie[id, param, value, expectedMutationInterval].

You web server would decide per user which cookie push regularly by the threshold time.

SELECT param, value
WHERE expectedMutationInterval < $thresholdTime
  AND id = UserId

You have to perform a regular recharacterization of cookies to update expectedMutationInterval per user per cookie. A simple SQL query would be able to perform the update of expectedMutationInterval. A more complex analysis could be performed to produce the value expectedMutationInterval.

If each cookie field change is logged by time, userid and ipaddr then your Cookie log table would be

CookieLog[id, time, ipaddr, param, value].

which would help your automated recharacterization routine decide what fields to push depending on the dayofweek/month/season and location/region/ipaddr.

Then after removing user info cookies from the browser, if you still find your sessison cookies overflowing, you now decide which session cookies to push to the browser and which stays on the local server. You use the same master-local db analysis technique but now used to decide between local db and pushing to browser. You leave your least frequently accessed session cookies on the local server, either as session attributes or on in-memory db. So when a client finds a cookie is missing, it makes are request to the server for the cookie while sacrificing some least recently/frequently used cookie space on the browser to accommodate placing of that fresh cookie.

Since these are session cookies, they need be propagated to other locations because if a same userid is logged on 3000 miles away, it should have its own set of session cookies.

Characterization of browser cookies are an irony because, for AJAX apps, the client accesses the cookies without letting the server know. Letting the server know might defeat the purpose of placing the cookies in the browser in the first place. So you would have to choose idle times to send cookie accesses to the server to log - for characterization purposes.

Such level of granularity is good for cookies that are short in lengths (parameter value + parameter name), be it session based or user based cookies.

Therefore, if your parameter names and values of cookie fields are long, you might seek to quantize them. However, quantization is a little more complex. Browser cookies have a lot of commonality. Just like any quantization/compression method, you look for the clusters of commonalities and assign each commonality block a signature. Then the cookies are stored in terms of the quantized signature.

How do you facilitate quantization of browser-based cookies? Using GWT as an example, use the Dictionary or Map class.

e.g., the cookie "%1"="^$Kdm3i" might translate to LastConnectedFriend=MohammadAli@jinnah.

You should not need to perform characterization, for example, why store your cookie as "LastConnectedFriend" when you could map it to "%1"? When a user logs in, why not map the most frequently accessed friends, etc, and place that map on the GWT/AJAX launching page? In that way you could shorten your session cookie lengths.

So, is your company looking for a statistical programmer? Disclaimer is, this is written off-the-cuff and might need some factual realignment.

Blessed Geek
Correction: "Since these are session cookies, they need be propagated to other locations" Should be "Since these are session cookies, they need NOT be propagated to other locations".
Blessed Geek

related questions