views:

321

answers:

4

I'm currently working with a web application written in Python (and using SQLAlchemy). In order to handle authentication, the app first checks for a user ID in the session, and providing it exists, pulls that whole user record out of the database and stores it for the rest of that request. Another query is also run to check the permissions of the user it has stored.

I'm fairly new to the web application development world, but from my understanding, hitting the database for something like this on every request isn't efficient. Or is this considered a normal thing to do?

The only thing I've thought of so far is pulling up this data once, and storing what's relevant (most of the data isn't even required on every request). However, this brings up the problem of what's supposed to happen if this user record happens to be removed in the interim. Any ideas on how best to manage this?

+1  A: 

Hi Sek, it's a Database, so often it's fairly common to "hit" the Database to pull the required data. You can reduce single queries if you build up Joins or Stored Procedures.

  • rAyt
Henrik P. Hessel
+1  A: 

"hitting the database for something like this on every request isn't efficient."

False. And, you've assumed that there's no caching, which is also false.

Most ORM layers are perfectly capable of caching rows, saving some DB queries.

Most RDBMS's have extensive caching, resulting in remarkably fast responses to common queries.

All ORM layers will use consistent SQL, further aiding the database in optimizing the repetitive operations. (Specifically, the SQL statement is cached, saving parsing and planning time.)

" Or is this considered a normal thing to do?"

True.

Until you can prove that your queries are the slowest part of your application, don't worry. Build something that actually works. Then optimize the part that you can prove is the bottleneck.

S.Lott
+2  A: 

You are basically talking about caching data as a performance optimization. As always, premature optimization is a bad idea. It's hard to know where the bottlenecks are beforehand, even more so if the application domain is new to you. Optimization adds complexity and if you optimize the wrong things, you not only have wasted the effort, but have made the necessary optimizations harder.

Requesting user data usually is usually a pretty trivial query. You can build yourself a simple benchmark to see what kind of overhead it will introduce. If it isn't a significant percentage of your time-budget, just leave it be.

If you still want to cache the data on the application server then you have to come up with a cache invalidation scheme.

Possible schemes are to check for changes from the database. If you don't have a lot of data to cache, this really isn't significantly more efficient than just reloading it.

Another option is to just time out cached data. This is a good option if instant visibility of changes isn't important.

Another option is to actively invalidate caches on changes. This depends on whether you only modify the database through your application and if you have a single application server or a clustered solution.

Ants Aasma
+1  A: 

For a user login and basic permission tokens in a simple web application I will definitely store that in a cookie-based session. It's true that a few SELECTs per request is not a big deal at all, but then again if you can get some/all of your web requests to execute from cached data with no DB hits at all, that just adds that much more scalability to an app which is planning on receiving a lot of load.

The issue of the user token being changed on the database is handled in two ways. One is, ignore it - for a lot of use cases its not that big a deal for the user to log out and log back in again to get at new permissions that have been granted elsewhere (witness unix as an example). The other is that all mutations of the user row are filtered through a method that also resets the state within the cookie-based session, but this is only effective if the user him/herself is the one initiating the changes through the browser interface.

If OTOH neither of the above use cases apply to you, then you probably need to stick with a little bit of database access built into every request.

zzzeek