ansaurus

Question

Strategies for dealing with URIs when building an application that sits behind a reverse proxy

Answer 1

+4 A:

Yep, common problem. How to solve this depends on the kind of app you have and the server platform and web framework you're working with. But there's a general way I've approached these problems which has worked pretty well so far.

My preference is to handle problems like this in application code, rather than relying on web server modules like mod_proxy_html to do it, because there are often too many special cases (e.g. client-side-javascript assembling URLs on the fly) which the server module doesn't catch. That said, I've resorted to the server-module approach in a few cases, but I decided to revise the module code myself to handle the corner cases. Also keep perormance in mind; fixing up URLs in your code at the time they're generated is usually faster than shoving the entire HTML through another server module.

Here's my recommendation of how to handle this in your code:

First, you'll need to figure out what kind of URLs to generate. My preference is for relative URLs. You are correct above that "add the appropriate number of ../'es" is messy, but at least it's your (the programmer's) mess. If you go with the config-file/environment-variable approach, then you'll be dependent on whoever deploys your app (e.g. an underpaid and grumpy IT operations engineer) to always set things up correctly. It also complicates release of your code, even if you're doing deployment yourself, since you can't simply copy your development files into production but need to add a per-deployment-environment custom step. I've found in the past that eliminating potential deployment problems is worth a lot of pre-emptive coding.

Next, you'll need to get those URLs into your code. How you do this varies based on type of content/code:

For server-side code (e.g. PHP, RoR, etc.) you'll want to make sure that server-side URL generation happens in as few places as possible in your code (ideally, one method!). If you're using any of the mainstream MVC web frameworks (e.g. RoR, Django, etc.), this should be trivial since URL generation using an MVC framework already generally goes through a single codepath that you can override. If you're not using one of those frameworks, you likely have URL generation littered throughout your code. But the approach you'll want to take is to generate all URLs via code, and then override that method to support transforming non-relative URLs into relative URLs. You can usually search for patterns in your code (like "/, '/, "http://, 'http://) and do a manual search and replace (or if you're really nerdy and have more patience than I do, craft a regex to replace each common case in your source code).

The key to making this work reliably is that, instead of manually replacing all absolute URLs with relative ones in your server-side code (which, even if you get each of them right, is fragile if files are moved), you can leave the absolute URLs in place and simply wrap them with a call to your "relativizer" method. This is much more reliable and unbrittle.

For Javascript, I generally like to do the same thing as server code-- move all URL generation into a single method and ensure any URL generation calls this method. This can be hard on an app with lots of pre-existing javascript, but the search-and-replace method above seems to work well in JS too.

For CSS, URLs in CSS are relative to the location of the CSS file (not the calling HTML page) so using relative URLs is generally easy. Simply put your CSS into a folder and either put images into deeper folders beneath it, or put images into a parallel folder to your CSS and use a single ../ to get to the images relatively. This is a good best practice in general-- if you're not doing relative URLs in CSS already, you should consider doing it, regardless of reverse proxy.

Finally, you'll need to figure out what to do for other oddball static files (like legacy static HTML files sometimes creep in). In general, I recommend the same practice as CSS and images-- ideally, you'd put static files into predictable directories and rely on relative URLs. Or (depending on your server platform) it may be easier to remap the file extensions of those static files so that they're processed by your web framework-- and then run your server-side URL generator for all URLs. Or, barring that, you can leave the files in place and manually fix up URLs to be relative-- knowing that this is brittle.

Coming full circle, sometimes there are just too many places where URLs are generated, and it's more effective to use a server module like mod_proxy_html. But I consider this a last resort-- especially if you won't be comfortable editing the source code if needed.

BTW, I realize I didn't mention anyting about your idea #4 above (javascript-link-fixup). I wouldn't do that-- if the user has javascript turned off or (more common) some network problem prevents that javascript for some time after the rest of the page loads, then your links won't work. Too risky.

Justin Grant 2009-12-19 23:38:58

Thanks for the detailed answer. I was already planning on using a single uri method for all server-side URI construction, so I may end up going with this idea. One potential problem I've thought of is that sometimes it is necessary to generate absolute URIs (including the host), e.g. in a RSS feed or automated email.

friedo 2009-12-20 04:45:38

Hmmm, if you need absolute URLs then my favorite approach may not work. Have you looked at the HTTP headers passed into your app from the reverse proxy? Some (although, AFAIK, not all) reverse proxies may pass a header to the proxy-ee letting them know the original URL-- in order to help app authors deal with cases like this.

Justin Grant 2009-12-20 04:56:05

Also, if the HTTP header doesn't work out and if the same web servers handle both RSS/email and regular traffic, you could add javascript into each non-RSS/email page which sends a request back to the server with the URL (from the client's perspective) as a query string parameter. Then your server-side handler for that request can parse out the proper host and path, cache it, and your RSS/email handlers can use it to build absolute URLs. If concerned about malicious clients, wait until you hear the same host/path from N different clients.

Justin Grant 2009-12-20 06:42:05

... but, before doing something as complicated as my wacky javascript-logger idea above, I'd start thinking about the deployment-time configuration option instead. :-)

Justin Grant 2009-12-20 06:44:10

I actually did investigate the header issue. Although `mod_proxy` will send you `X-Forwarded-For` to get the real remote address, it unfortunately does not send the real URI (if it did, I would have just used that instead of asking the question. :) ). Emails and RSS feeds are optional features in this app, so I don't mind requiring additional config information for them. The important bit (for me) is getting up and running immediately.

friedo 2009-12-20 08:27:55

The main application I'm currently maintaining sits behind a reverse proxy and goes with a config option for the applications generated URIs. All the RSS feeds or email blast links get the URL root generated from the config file most everything else is relative pathing.

Jeff Beck 2009-12-25 22:25:10

ansaurus

tags:

views:

answers:

Strategies for dealing with URIs when building an application that sits behind a reverse proxy

related questions