views:

398

answers:

3

Hi,

I am intrigued as to how singletons work in Google App Engine (or any distributed server environment). Given your application can be running in multiple processes (on multiple machines) at once, and requests can get routed all off the place, what actually happens under the hood when an app does something like: 'CacheManager.getInstance()'?

I'm just using the (GAE) CacheManager as an example, but my point is, there is a single global application instance of a singleton somewhere, so where does it live? Is an RPC invoked? In fact, how is global application state (like sessions) actually handled generally?

Regards, Shane

+2  A: 

Caches are generally linked up with some sort of distributed replicated cache. For example, GAE uses a custom version of memcached to handle maintaining a shared cache of objects across a cluster, while maintaining the storage state in a consistent state. In general there are lots of solutions for this problem with lots of different tradeoffs to be made in terms of performance and cache coherence (eg, is it critical that all caches match 100% of the time, must the cache be written to disk to protect against loss, etc).

Here are some sample products with distributed caching features (most have documentation describing the tradeoffs of various approaches in great detail:

As you can see, there have been many projects that have approached this problem. One possible solution is to simply share a single cache on a single machine, however, most projects make some sort of replication and distributed failover possible.

jsight
Thank you for the cache examples, but I was using the cache as just an example of a web development singleton use. I could as easily been the Users singleton.As I say in the answer above, I'm confused by the idea of some singleton object persisting after a request/response phase. It's my understanding that you can't assume anything survives a request, so I'm trying to understand what the singleton actually means, and what it does exactly in a distributed web environment.In the mean time, I will study the code for the cache examples you gave, to see what they do.
Shane
I gave the caching examples, because there is no single answer to this question. If you want the singleton to be per VM (or even per-request), you would not distribute it at all. On the other hand, if you want to share the singleton across multiple VMs and requests, you'd end up using some sort of caching solution. The same is true for session handling. I suspect that the "Singletons" that you are referring to are really just per VM (possibly even per-request) singletons used to access a client for shared resources.
jsight
A: 

I'm not sure on the specifics of GAE, but typically in a web app this size, you'll have multiple processes running over a number of machines (and then load balance between them). Within each process, if you're using a multi-threaded web server, you can be handling multiple requests. So this would allow you to share objects between requests within the same web server (and a singleton, for example, you would instantiate when the web app process starts).

If the web server is not multi-threaded, but rather multi-process, then you can't share objects between requests as far as I know, without talking to a separate caching process.

The GAE docs seem to support what they call "App Caching" which essentially allows you to do the same thing, but it wasn't clear to me from the docs whether they're doing this by using multi-threaded web servers, or some other caching process that is running alongside the web servers.

I'd be intrigued to know if CacheManager.getInstance() always resolves to the same object, or if it's only the same object for requests handled by the same web server. In reality, it doesn't matter as it's only being used to talk to the separate memcached process anyway.

Michael Hart
I would have thought that the very nature of a singleton would mean it has to resolve to the same object instance. When I think request, I literally never expect it to have a process affinity, especially with Google App Engine. I'm expect one request to go to Australia and the next to go to the US, so on that basis the only method I can think of for a reliable case is an RPC to some sort of central controller which dishes out the object. It just seems so anti-scale to me, hence my interest. The memcache example is a tricky case in a cloud environment spread across multiple data centers.
Shane
+6  A: 

The singletons in App Engine Java are per-runtime, not per-webapp. Their purpose is simply to provide a single point of access to the underlying service (which in the case of both Memcache and Users API, is accessed via an RPC), but that's purely a design pattern for the library - there's no per-app singleton anywhere that these methods access.

Nick Johnson
This is how I predicted it would work, it's nice to have it cleared up though. Cheers. :)
Shane