views:

938

answers:

3

We run a relatively high volume content site. Like most content sites, the majority of each page is relatively static. The articles rarely change, making them good candidates for some form of static/edge caching. There are two big problems, though. Secondary page elements (nav, recent content lists, etc) change pretty frequently, quickly invalidating "full" cached pages. It's also quite common that we include more dynamic bits in a page, like user specific information, etc.

It would be really neat to have a reverse-proxy/load balancer that post-processed content and let us handle includes at the proxy/edge. The initial request to the backend would return a rough template, then the proxy software could process that template to complete it. The markup might look something like this:

<html>
<body>
  <div id="content">
    Lorem ipsum whackem smackem.
    <%
      dynamic "http://related.content.service/this/story"
    %>
  </div>
  <div id="sidebar">
    <%
      dynamic do |request|
        url = "http://my.user.service/user-widget.html"
        if request.cookies.contains?("user_token")
          url = "http://my.user.service/" + request.cookies["user_token"] + "/user-widget.html"
        end

        error_text = "User service not available"
        { :url => url, :timeout => 500, :error => error_text }
      end
    %>
  </div>
</body>
</html>

What you'll see in that example is a small bit of Ruby that determines the included file based on a cookie value, then returns a hash with the URL to pull in, a timeout, and some default text to show in the event of an error. In theory, all the includes could be requested asynchronously as well.

My understanding is that Amazon does something like this. Various page components are generated by backend services, with strict timeout limits to ensure overall page speed. I was hoping their CDN service would include something like this, but it's not to be!

There's a W3 spec for Edge Side Includes (ESI) is almost what I want. There's very little support for it out there, however. It's available through Akamai, there's some Oracle software that does it, and the open source Varnish cache has a very basic implementation. It's also a really ugly XML format.

So the question is: what out there will let me do what I want? Is anyone else doing things in this way?

+1  A: 

set Nginx as a front-end, and use SSI to pick the dynamic parts of the pages. dynamic source can be an HTTP server, like Apache, or a FastCGI server, for example PHP, or Django.

edit:

Many webservers support some form of SSI (Server Side Includes), this feature lets you add some tags into the HTML as a very limited form of scripting, much simpler and faster (and older) than PHP. Using this you can set static pages with most of the content, and for the 'small dynamic parts', an SSI tag references a dynamic page generated somewhere else.

I particularly like nginx as a frontend to almost anything. it's wicked fast, light on resources and hugely scalable (think lighthttp with cleaner and stabler code). the author describes it not as a general-purpose webserver; but as a proxy frontend. The backends can be an HTTP server (usually Apache) or FastCGI processes (PHP, Python, Perl, whatever), or a farm of either, or both.

the memcached module is amazing, it uses memcached (which is the fastest and most scalable general-purpose distributed hashtable around) to directly relate a webpage with an URL, no disk access involved. since memcached is accessible from 'outside' the webserver itself, it can be used even with dynamic pages (given a sane URL/resource mapping); but I don't think it would help a lot in your case. in any case, first make it work with SSI, then you can (if necessary) optimise the dynamic part with memcached.

Javier
Can you expand on this answer at all? It doesn't sound like it gives me much of what I want, but it's possible I'm missing something.
MrKurt
Ah, that's slightly more helpful. It wouldn't really help me conditionally include things, though, which makes it less useful for the types of scenarios I'm interested in.
MrKurt
remember that the 'things' that SSI can insert into the whole page can be dynamically generated by any backend server.
Javier
+1  A: 

I know a few people have written about using nginx SSI with the memcache nginx module to splice together content fragments. It's a lot more limited than something like ESI, but still useful.

Jason Watkins
A: 

So it turns out that Varnish has (and had) basic ESI support that does nearly everything I wanted it to. If anyone needs to do some ESI stuff, Varnish seems to work pretty well for it. It's pretty basic, but still awesome.

MrKurt