Harvesting Dynamic HTTP Content to produce Replicating HTTP Static Content

views:

answers:

+1 Q:

Harvesting Dynamic HTTP Content to produce Replicating HTTP Static Content

I have a slowly evolving dynamic website served from J2EE. The response time and load capacity of the server are inadequate for client needs. Moreover, ad hoc requests can unexpectedly affect other services running on the same application server/database. I know the reasons and can't address them in the short term. I understand HTTP caching hints (expiry, etags....) and for the purpose of this question, please assume that I have maxed out the opportunities to reduce load.

I am thinking of doing a brute force traversal of all URLs in the system to prime a cache and then copying the cache contents to geodispersed cache servers near the clients. I'm thinking of Squid or Apache HTTPD mod_disk_cache. I want to prime one copy and (manually) replicate the cache contents. I don't need a federation or intelligence amongst the slaves. When the data changes, invalidating the cache, I will refresh my master cache and update the slave versions, probably once a night.

Has anyone done this? Is it a good idea? Are there other technologies that I should investigate? I can program this, but I would prefer a configuration of open source technologies solution

Thanks

I've used Squid before to reduce load on dynamically-created RSS feeds, and it worked quite well. It just takes some careful configuration and tuning to get it working the way you want.

Marc Novakowski 2009-07-02 18:07:34

Using a primed cache server is an excellent idea (I've done the same thing using wget and Squid). However, it is probably unnecessary in this scenario.

It sounds like your data is fairly static and the problem is server load, not network bandwidth. Generally, the problem exists in one of two areas:

Database query load on your DB server.
Business logic load on your web/application server.

Here is a JSP-specific overview of caching options.

I have seen huge performance increases by simply caching query results. Even adding a cache with a duration of 60 seconds can dramatically reduce load on a database server. JSP has several options for in-memory cache.

Another area available to you is output caching. This means that the content of a page is created once, but the output is used multiple times. This reduces the CPU load of a web server dramatically.

My experience is with ASP, but the exact same mechanisms are available on JSP pages. In my experience, with even a small amount of caching you can expect a 5-10x increase in max requests per sec.

Peter J 2009-07-02 18:26:07

The problem certainly is #1 and #2: the response time is often dozens of seconds (please, don't ask). As mentioned, I cannot address them in the short term (or rather, I am addressing them, but there are a great deal of them, and they are not JSP based and ....). I have clients with USA, European and Asian users, so I would very much like to replicate the cache once I have primed it. For internal corporate users, Akamai-like is not appropriate. I'd like to tar, zip the cache and FTP it back to the slaves. In other cases, the cache server, but not the app needs to be on a DMZ.

2009-07-02 20:10:04

It sounds like you've already settled on the proxy server solution. Your best bet will be serverfault.com for proxy implementation questions. My humble opinion is that the time you'll spend designing and implementing distributed proxy servers would be better spent coding some caching within your application. Cache APIs exist for all the major frameworks.

Peter J 2009-07-02 21:46:08

I would use tiered caching here; deploy Squid as a reverse proxy server in front of your app server as you suggest, but then deploy a Squid at each client site that points to your origin cache.

If geographic latency isn't a big deal, then you can probably get away with just priming the origin cache like you were planning to do and then letting the remote caches prime themselves off that one based on client requests. In other words, just deploying caches out at the clients might be all you need to do beyond priming the origin cache.

Jon Moore 2009-11-01 04:18:13

ansaurus

tags:

views:

answers:

Harvesting Dynamic HTTP Content to produce Replicating HTTP Static Content

related questions