views:

62

answers:

2

I manage a system for academic software projects which, as well as other things, allows projects to provide web pages integrated with an instance of the Trac bug tracker / wiki / source browser. The idea is that the users have freedom to design their main pages as they like (they really like that), but with some convenience/branding features like common navigation elements, and an (unobtrusive) link bar across the top to link back to the main page and other hosted projects. For want of a better word, I'm using "wrapping" to describe this inserting of nav elements and the link bar into the documents. I know, I know: overloaded terminology. Sorry if you were expecting a carriage-return question ;)

For several years this system has worked fairly nicely with the "wrapping" implemented by a user-defined Apache 2 OutputFilter for text/html: as the HTML stream leaves the server, I parse the input HTML into a DOM, normalize the tree a bit if needed, and then insert the appropriate extra elements. A bit scrappy, but the best way I could find, and so far it's worked well. However, I'm now wanting to upgrade the Trac system to 0.11, in which some neat AJAX is used to lazily render directory trees without reloading: the effect is that the AJAX HTML stream also gets modified by the Apache filter, so that there are new "top bars" and nav furniture added each time I open a directory. This is obviously pretty sucky, and I'd rather that my users could also use AJAX if they want, rather than hack in a Trac-specific "ignore the dir browsing HTTP requests" system.

What I want to know is whether anyone has a better way to apply such post-processing to web pages... particularly ways that will intrinsically play a bit nicer with AJAX, without restricting my users' freedom to do what they want with their pages. Thanks!

A: 

as the HTML stream leaves the server, I parse the input HTML into a DOM, normalize the tree a bit if needed, and then insert the appropriate extra elements.

I'm not familiar with this technique, so I must ask what exactly does the parsing? A language or is it a set of rules or what? If it's a language, you could check the headers of the request and if it is an AJAX request you could simply return the stream as is instead of adding the navigation.

Paolo Bergantino
The filter can be written in any language, and exposes the same environment variables as a CGI script has access to. Personally, I'm using a combination of the Python BeautifulSoup and HTML Tidy APIs to robustly parse and modify the HTML, but that's the solved part of the problem: the sticking point is that the filter is applied to *every* HTTP request. The HTTP headers might be available: is there a reliable aspect of HTTP headers that indicates if a request is AJAX-based?
andybuckley
A: 

What language are you using?

You're going to have a hard time doing this with DOM, as it easily changes..

Dare I say it.. Consider using regexes instead.

Evert
The DOM isn't the problem: the parsing and modifying bit is solved (and stable for >2 years). The problem is with identifying which HTTP requests are "normal" and hence should be filtered, and which are AJAX and should be left alone.
andybuckley