tags:

views:

219

answers:

2

I had the following idea: Say we have a webapp written using django which models some kind of bulletin board. This board has many threads but a few of them get the most posts/views per hour. The thread pages look a little different for each user, so you can't cache the rendered page as whole and caching only some parts of the rendered page is also not an option.

My idea was: I create an object structure of the thread in memory (with every post and other data that is needed to display it). If a new message is posted the structure is updated and every X posts (or every Y minutes, whatever comes first) the new messages are written back to the database. If the app crashes, some posts are lost, but this is definitely okay (for users and admins).

The question: Can I create such a persistent in memory storage without serialization (so no serialize->memcached)? As I understand it, WSGI applications (like Django) run in a continuous process without shutting down between requests, so it should be possible in theory. Is there any API I could use? If not: any point to look?

/edit1: I know that "persistent" usually has a different meaning, but in this case I strictly mean "in between request".

A: 

An in memory storage is not persistent, so no.

I think you mean that you only want to write to the database ever X new posts of objects. I guess this is for speedup reasons. But since you need to serialize them sooner or later anyway, you don't actually save any time that way. However, you will save time by not flushing the new objects to disk, but most databases already support that.

But you also talk about caching the rendered page, which is read caching. There you can't cache the finished result you say, but you can cache the result of the database query. That means that new message will not be immediately updated, but take a minute or so to show up, but I think most people will see this as acceptable.

Update: In this case not, then. But you should still easily be able to cache the query results, but invalidate that cache when new responses are added. That should help.

Lennart Regebro
I thought it would be clear that I meant "in between requests" with "persistent".In fact the last point is the reason I thought about this in memory storage. New Posts need to show up immediately, so I can't rely on any read cache but need to render the page on (nearly) each request. Sure, there will be some requests which could be done with a cached page, but most of them will not.
Martin
Well, if writing to disk is the bottleneck then you are in deep shit here. ;) But for reading you should be able to have memcache that you invalidate when new responses are written.
Lennart Regebro
+1  A: 

In a production WSGI environment, you would probably have multiple worker processes serving requests at the same time. These worker processes would be recycled from time to time, meaning local memory objects would be lost.

But if you really need this (and make sure you do), I suggest you look into Django's caching framework, check out local-memory caching. Also, have a look at sessions.

But even the local-memory caching uses serialization (with pickle). It is easy to implement local-memory cache without serialization by implementing a custom cache back-end (see the docs). You could use the code in locmem.py as a starting point to create a cache without serialization.

But I suspect you are doing a bit of premature optimization here?

codeape