views:

236

answers:

6

Hi everybody,

I am thinking about using a noSQL (mongoDB) paired with memcached to store sessions with in my webapp. The idea is that upon each page load, the user data is compared to the data in the memcache and if something has changed, the data would be written to both memcached and mySQL. This way the reads would be greatly reduced and memcached utilized to do what it does best.

However I am a bit concerned about using a non-ACID database for session storage especially with the memcached layer. Let's say something goes wrong while updating the session to the DB and our users got instant headache wondering why their product that they put in the cart doesn't show up...

What's an appropriate approach to this? Should we go for a mySQL session storage or is it fine to keep a non-acid supportive database for sessions?

Thanks!

+1  A: 

Without any locking on the session, be really, really careful of what you are storing. Never ever store anything that is dependent on what you have read before as the data might change between you reading and writing - especially in case of ajax where multiple requests can go out at once.

An example what you must not store in a non-locked session would be a shopping cart as, to add a product, you have to read, unserialize, add the product and then serialize again. If any other request does the same thing between the first requests read and write, you lose the second request's data.

Have a look at this article for detail: http://thwartedefforts.org/2006/11/11/race-conditions-with-ajax-and-php-sessions/

Keep Sessions on your filesystem (where PHP locks them for you), in your database (where you have to do manual locking) or never, ever, write anything of value to your session if that value is derived of a previous read.

pilif
Hi! That's definitely worrying with the locking. However going for the filesystem sessions won't do the trick for us since we will be load balancing soon...
Industrial
You can configure most load balancers to have requests belonging to one session always go to the same machine (session affinity). If that's not desirable, use a database in conjunction with advisory locks or don't write data to the session that is dependent on data previously read.
pilif
+1  A: 

While using memcached as a cache for database, it is the user who have to ensure the data consistency between database and cache. If you'll want to scale up and add more servers there is a probability to be out of sync with database even if everything seems ok.

Instead you may consider Hazelcast. As of 1.9 it also supports memcache protocol. Compared to memcached Hazelcast wants you to implement Map Persister and only itself updates the database for the updated entries. This way you don't have to handle "check cache, if data changed update database" kind of stuff.

Fuad Malikov
+2  A: 

I'm using MongoDB as session storage currently. It is possible to avoid race conditions mentioned by pilif. I found a class that implements a session handler for MongoDB (http://www.jqueryin.com/projects/mongo-session/) and forked it on github to suit my needs (http://github.com/halfdan/MongoSession).

halfdan
+1  A: 

If you write your app so that the user stores all session information client side, then you just verify that information as needed, you won't need to worry about sessions on the server side. This is one of the principles in REST style architecture. For instance, if the user is requesting adding an item to their shopping cart, just store the itemID list and count on the client side. When you hit the cart page, you can easily look up the item information from the list of itemIDs they are telling you are in their cart.

During checkout, go directly against the database with transactions to ensure you aren't getting any race conditions, and check your live inventory. If inventory isn't there when they go to check out, just say, "sorry, we just sold out". Of course, at that point you should go update any caches you have out there that are telling people you have inventory.

Zak
So it will depend on cookies you say?
Industrial
+1  A: 

If you don't want to lose your data, stick with ACID tested databases.

What's the payoff you're looking for?

If you want a secure system, you can't trust anything from the user, save for perhaps selected integers, so letting them store the information is typically a really bad idea.

I don't see the payoff for storing sessions outside of your MySQL database. You can chron cleanup on the tables if that's your concern, but why bother? Some users will shop on a site and then get distracted for a while. They would then come back a day or two later.

If you use cookies or something really temporary to store their session info, there is a really good chance their shopping time was wasted. Users really value their time... so if you stored their session info in the database, you can write something sexy to manage that data.

Plus, the nice side effect of this is that you'll generate a lot of residual information about what people like on your website that wouldn't perhaps be available to you later on. Like you could even consider some of it to be like a poll or something where the items people are adding to their cart could impact how you manage your business, order inventory or focus your marketing.

If you go with something really temporary then you lose out on getting residual benefits.

Geekster
Thanks Geekster. We wanted to use Mongo since that's where we will store the main part of our data. Your concerns about the residual information was however great to take into account, we hadn't thought about them even if they were kind of obvious ...
Industrial
So obviously you're using Mongo because of performance. You can cut corners to get performance, but I think stability is greater than performance. I've run a 1.4mil record lookup web app for MySQL that was Slashdotted and still managed to perform very well under stress. What kind of optimization have you done on your code that has made you feel that Mongo is needed?
Geekster
It's basically due to the huge amount of joins that we need to do in a SQL database to fetch a dataset. Joins are huge performance thieves, so for that reason only we decided to go for the noSQL route...
Industrial
Thanks for the great answer Geekster. The residual information was totally out of our concerns - big ups to you. The bounty is yours!
Industrial
Thanks for the prize! :) Shopping psychology identifies factors that impact how people make purchases. Only if everything is PERFECT will someone complete a purchase. So with this residual info you need to consider why someone exits a sale. What page were they on when they left? What were they looking at? Where was their mouse? Did they not manage to get Paypal set up correctly? Did they check your competitors and find a cheaper price mid-sale? Some companies put competitor pricing with offsite links just so they KNOW. You can code an offsite script that tracks wherever they go, ethically.
Geekster
Follow-up: why do joins at all? Normalization with record id or hash indexes on MySQL have helped me survive a Slashdotting. Sync is sometimes hard to visualize ahead of time but it really pays off to have someone figure out how you can optimize. 9/10 you'll find that whatever you wanted to do isn't that important to the customer, so if it is bogging your site down then it's time to get someone in to normalize it all.
Geekster
+1  A: 

I would look at how much the user costs to acquire and then ask what is the cost for implementing a really good system. Keep in mind that users are a biological retry method. "I'm bored... press reload again..." While, this isn't the most perfect solution, it is sometimes acceptable vs the cost comparsion for "not lose anything - ever".

If you want additional security, you can have your sessions cached to a separate set of memcache servers so there are no accidental flushes. :)

There are a number of other systems membase.org, and some other persistent memcache solutions (java implementations) that will persist storage to disk. If you want to modify your client somewhat, or how you access memcache, you can do your own replication of memcache session objects.

-daniel

Daniel