views:

1866

answers:

3

Well, I think I have a very basic doubt here:

I'm developing an app on GAE (Java) and performing a query to the datastore that returns a lot of entities, so I need to cache it. I was using memcache and it was working great, but if I keep the list of entities in a static variable, the whole request goes as twice as fast than using memcache. I think that's because I'm not deserializing the entities all the time.

What would be the drawback of using a static variable instead on memcache? I don't know if there could be several instances of my application in the cloud, and thus several instances of my static variable?

The list of entities I'm trying to cache are the best (greater score) posts of the last week. I take that list and choose 5 random posts and show them in a couple of pages.

Thanks for the help!

+2  A: 

Yeah, there is no guarantee that your instance will be the same for various users on the internet. You could end up constantly reading this into a static in the worst case. The memcache has a higher guarantee of being available. I would just use the memcache, and your app should not have any scale issues in the future.

drudru
+5  A: 

App Engine scales by creating new instances of your application as the number of users hitting it increases. As drudru said, different users might be served by different instances. In general, memcache is the fastest place to store something you want to be globally consistent. However, in your case there may be some room for improvement.

You mention you have a list of posts and you randomly choose 5 to show to users. Does it matter if 2 different users see a different set of 5? If you're choosing random ones anyway, maybe it doesn't matter. Then you could store the full list of posts in memcache, and pull 5 random ones out of memcache and store them in a static variable.

Second, what exactly are you memcaching, and how are you pulling it out? Are you storing a whole bunch of full posts in memcache, getting them all, then choosing 5? Maybe you could just download the list of posts, choose 5, and only get the 5 you need? If you think it's the deserializing that's slowing you down, this might help. Are you doing any processing on the posts after you get them? If so, could the results of that processing be cached?

Peter Recore
Yes: I'm memcaching the whole list of posts and getting them all, then choosing 5. It would be faster (and smarter!) if I get only the 5 I want. As you said, it doesn't matter if 2 different users see a different set of 5. In fact, if one user reloads the page, the set will be different, so maybe I could keep using the static var? I really don't care if there are several instances of the list that are different.Thanks Peter!!
Damian
If you want to try to get every last bit of performance possible, you could try two levels of cache. When a request comes in, you would first check to see if you have a valid value in your static variable cache, and if not, you would check memcache. If there's nothing valid in memcache, then you would grab the data from the datastore, and populate both memcache and your static variable.
Peter Recore
One last thing - how much does this one operation affect your total page load time? If you get this operation to go from 10 ms to 5 ms, that's cool, but if you have some other operation that takes 300ms, you should focus your energy there first :)
Peter Recore
From what I've seen, this operation takes a really important part of the load time.Thanks peter for your answers!!
Damian
+3  A: 

You cannot rely on static variables (or anything else in JVM memory) to be around when the next request hits, because Google is free to start and stop virtual machines when they feel like it. From the looks of it, they seem to prefer to start additional JVMs instead of additional threads in the same JVM, which compounds this issue.

But, you should be able to use static variables as a cache layer, provided you have a way to load the data from somewhere else if it went away.

I would also not try to go overboard with memory usage there, there must be a quota on how much memory you can use.

Thilo