views:

128

answers:

1

I'm building an application with a self-contained HTTP server which can be either accessed directly, or put behind a reverse proxy (like Apache mod_proxy).

So, let's say my application is running on port 8080 and you set up your Apache like this:

ProxyPass /myapp http://localhost:8080
ProxyPassReverse /myapp http://localhost:8080

This will cause HTTP requests coming into the main Apache server that go to /myapp/* to be proxied to my application. If a request comes in like GET /myapp/bar, my application will see GET /bar. This is as it should be.

The problem that arises is in generating URIs that have to be translated from my application's URI-space in order to work correctly via the proxy (i.e. prepending /myapp/).

The ProxyPassReverse directive takes care of handling this for URIs in HTTP headers (redirects and so forth.) But that doesn't handle URIs in the HTML generated by my application, or in static files and templates.

I'm aware of filters like mod_proxy_html, but this is a non-standard Apache module, and in any case, such filters may not be available for other front-end web servers which are capable of acting as a reverse proxy.

So I've come up with a few possible strategies:

  1. Require an environment variable be set somewhere that contains the proxy path, and prepend this to all generated URIs. This seems inelegant; it breaks the encapsulation provided by the reverse proxy.

  2. Put the proxy path in a configuration file for my application. Same objection as above.

  3. Use only relative URIs in my application. This can get somewhat tricky; I would have to calculate the path difference between the current resource and where the link is going and add the appropriate number of ../'es. Seems messy. Another problem is that some things must generate absolute URIs, like RSS feeds and generated emails.

  4. Use some hacky Javascript on the front-end to mungle URIs in the document text. This seems like a really horrible idea from an interoperability standpoint.

  5. Use a singe URI-generating function throughout my code, and require "static" files like Javascript, CSS, etc. to be run through my templating system. This is the idea I'm leaning towards now.

This must be a fairly common problem. How have you approached it in the past? What has worked and what has made things more difficult?

+4  A: 

Yep, common problem. How to solve this depends on the kind of app you have and the server platform and web framework you're working with. But there's a general way I've approached these problems which has worked pretty well so far.

My preference is to handle problems like this in application code, rather than relying on web server modules like mod_proxy_html to do it, because there are often too many special cases (e.g. client-side-javascript assembling URLs on the fly) which the server module doesn't catch. That said, I've resorted to the server-module approach in a few cases, but I decided to revise the module code myself to handle the corner cases. Also keep perormance in mind; fixing up URLs in your code at the time they're generated is usually faster than shoving the entire HTML through another server module.

Here's my recommendation of how to handle this in your code:

First, you'll need to figure out what kind of URLs to generate. My preference is for relative URLs. You are correct above that "add the appropriate number of ../'es" is messy, but at least it's your (the programmer's) mess. If you go with the config-file/environment-variable approach, then you'll be dependent on whoever deploys your app (e.g. an underpaid and grumpy IT operations engineer) to always set things up correctly. It also complicates release of your code, even if you're doing deployment yourself, since you can't simply copy your development files into production but need to add a per-deployment-environment custom step. I've found in the past that eliminating potential deployment problems is worth a lot of pre-emptive coding.

Next, you'll need to get those URLs into your code. How you do this varies based on type of content/code:

For server-side code (e.g. PHP, RoR, etc.) you'll want to make sure that server-side URL generation happens in as few places as possible in your code (ideally, one method!). If you're using any of the mainstream MVC web frameworks (e.g. RoR, Django, etc.), this should be trivial since URL generation using an MVC framework already generally goes through a single codepath that you can override. If you're not using one of those frameworks, you likely have URL generation littered throughout your code. But the approach you'll want to take is to generate all URLs via code, and then override that method to support transforming non-relative URLs into relative URLs. You can usually search for patterns in your code (like "/, '/, "http://, 'http://) and do a manual search and replace (or if you're really nerdy and have more patience than I do, craft a regex to replace each common case in your source code).

The key to making this work reliably is that, instead of manually replacing all absolute URLs with relative ones in your server-side code (which, even if you get each of them right, is fragile if files are moved), you can leave the absolute URLs in place and simply wrap them with a call to your "relativizer" method. This is much more reliable and unbrittle.

For Javascript, I generally like to do the same thing as server code-- move all URL generation into a single method and ensure any URL generation calls this method. This can be hard on an app with lots of pre-existing javascript, but the search-and-replace method above seems to work well in JS too.

For CSS, URLs in CSS are relative to the location of the CSS file (not the calling HTML page) so using relative URLs is generally easy. Simply put your CSS into a folder and either put images into deeper folders beneath it, or put images into a parallel folder to your CSS and use a single ../ to get to the images relatively. This is a good best practice in general-- if you're not doing relative URLs in CSS already, you should consider doing it, regardless of reverse proxy.

Finally, you'll need to figure out what to do for other oddball static files (like legacy static HTML files sometimes creep in). In general, I recommend the same practice as CSS and images-- ideally, you'd put static files into predictable directories and rely on relative URLs. Or (depending on your server platform) it may be easier to remap the file extensions of those static files so that they're processed by your web framework-- and then run your server-side URL generator for all URLs. Or, barring that, you can leave the files in place and manually fix up URLs to be relative-- knowing that this is brittle.

Coming full circle, sometimes there are just too many places where URLs are generated, and it's more effective to use a server module like mod_proxy_html. But I consider this a last resort-- especially if you won't be comfortable editing the source code if needed.

BTW, I realize I didn't mention anyting about your idea #4 above (javascript-link-fixup). I wouldn't do that-- if the user has javascript turned off or (more common) some network problem prevents that javascript for some time after the rest of the page loads, then your links won't work. Too risky.

Justin Grant
Thanks for the detailed answer. I was already planning on using a single uri method for all server-side URI construction, so I may end up going with this idea. One potential problem I've thought of is that sometimes it is necessary to generate absolute URIs (including the host), e.g. in a RSS feed or automated email.
friedo
Hmmm, if you need absolute URLs then my favorite approach may not work. Have you looked at the HTTP headers passed into your app from the reverse proxy? Some (although, AFAIK, not all) reverse proxies may pass a header to the proxy-ee letting them know the original URL-- in order to help app authors deal with cases like this.
Justin Grant
Also, if the HTTP header doesn't work out and if the same web servers handle both RSS/email and regular traffic, you could add javascript into each non-RSS/email page which sends a request back to the server with the URL (from the client's perspective) as a query string parameter. Then your server-side handler for that request can parse out the proper host and path, cache it, and your RSS/email handlers can use it to build absolute URLs. If concerned about malicious clients, wait until you hear the same host/path from N different clients.
Justin Grant
... but, before doing something as complicated as my wacky javascript-logger idea above, I'd start thinking about the deployment-time configuration option instead. :-)
Justin Grant
I actually did investigate the header issue. Although `mod_proxy` will send you `X-Forwarded-For` to get the real remote address, it unfortunately does not send the real URI (if it did, I would have just used that instead of asking the question. :) ). Emails and RSS feeds are optional features in this app, so I don't mind requiring additional config information for them. The important bit (for me) is getting up and running immediately.
friedo
The main application I'm currently maintaining sits behind a reverse proxy and goes with a config option for the applications generated URIs. All the RSS feeds or email blast links get the URL root generated from the config file most everything else is relative pathing.
Jeff Beck