views:

573

answers:

5

I have been experimenting with woopra.com A web analytics tool. Which requires a piece of javascript code to be added to each page to function. This is easy enough with more dynamic sites with universal headers or footers but not for totally static html pages.

I attempted to work round it by using a combination of Apache rewrites and SSI's to "Wrap" the static html with the required code. For example...

I made the following changes to my apache config

    RewriteEngine On
    RewriteCond %{REQUEST_URI} !=test.shtml
    RewriteCond %{IS_SUBREQ}  false 
    RewriteRule (.*)\.html test.shtml?$1.html

The test.shtml file contains...

    <script type="text/javascript">
       var XXXXid = 'xxxxxxx';
    </script>
    <script src="http://xxxx.woopra.com/xx/xxx.js"&gt;&lt;/script&gt;

    <!--#set var="page" value="$QUERY_STRING" -->
    <!--#include virtual= $page -->

The idea was that a request coming in for

    /abc.html

would be redirected to

    /test.shtml?abc.html

the the shtml would then include the original file into the response page.

Unfortunately it doesn't quite work as planed :) can anyone see what I am doing wrong or perhaps suggest an alternative approach. Is there any apache modules that could do the same thing. Preferably that can be configured on a per site basis.

Thanks

Peter

+2  A: 

I think that mod_filter_ext is the module you are looking for. You can write a short Perl script for example to insert the JS code in the pages and register it to process HTML pages:

while (<>) {
    s/<html>/\Q<script>....\E/;
    print $_;
}

You could even use something like sed to perform the substitution.

Cd-MaN
A: 

If the pages are static, why would you change them on the fly instead of preprocessing all pages on a site, adding the piece of requiered javascript to each one of them? This is simple and probably more efficent (you probably have more pageviews than pages to change)

This could be done a lots of way. I would suggest a small perl to to inline replacement.

Pablo Alsina
A: 

@Pablo Alsina

why would you change them on the fly instead of preprocessing all pages on a site

There are a number of reasons why you may want to leave the original static files unchanged.

  1. They may belong to someone else. Eg administratively changing the files uploaded by another user
  2. They may be being auto-generated by another system that you don't want/cannot change.
  3. You may want to be able to enable/disable/modify the extra data instantly. You don't want to have to re-parse an entire site every time (could be 100's of thousands of pages)
  4. You might be doing it for the technical challenge :-)

Peter

Vagnerr
A: 

ok the method above's biggest problem is it would break your html validity by placing a script tag outside the <html> tags

i'd agree with the others on a pre-process run over your html files such as a sed/awk script

heres a quick example {assuming the script part can be added before the </head> and that the </head> is at the start of a newline

#!/bin/bash

cd /var/webserver/whatever/

grep -r '<\/head>' */*|grep "^.*\.html*:" >/var/tmp/tempfile.txt
((lines = $(wc -l /var/tmp/dom-tempfile.txt | awk '{print $1}')))
if [ $lines -gt 0 ]
then
 while read line; do
 sed 's/<script type="text\/javascript"> var XXXXid = "xxxxxxx"; <\/script><script src="http:\/\/xxxx\.woopra\.com\/xx\/xxx\.js"><\/script><\/head>/^<\/head>/g' $line>/var/tmp/tempfile.htm
 mv /var/tmp/tempfile.htm $line
 done < <(sed 's/\(^.*\.html*\):.*$/\1/' /var/tmp/tempfile.txt)
fi
exit 0
Alan Doherty
A: 

You may have a syntax error since $page is not included in quotes, however the two main reasons that this doesn't are the following:

  • include virtual should a path starting with /, in your example the query string should be /abc.html , not abc.html
  • the rewrite rule should start with the path as well, so the rewrite rule has to be

    RewriteRule ^(.*)\.html /test.shtml?$1.html
    
Alex Lehmann