views:

87

answers:

3

Hi folks,

I have a HTML document where I want to automatically have a version number added to all script and style sheet references.

the following should be replaced

 <link ... href="file.css" media="all" />
 <script ... src="file.js"></script>
 <script ... src="http://myhost.com/file.js"&gt;&lt;/script&gt;

with that

<link ... href="file.v123456.css" media="all" />
<script ... src="file.v123456.js"></script>
<script ... src="http://myhost.com/file.v123456.js"&gt;&lt;/script&gt;

where 123456 is my dynamic version number.

But it shoud ONLY do that on local files

<link ... href="http://otherhost.com/file.css" media="all" />

should be left untouched.

So far I have the following regex:

$html = preg_replace('#(src|href)=("|\')(?!http)(?!("|\'| |\+))(.*)\.(css|js|swf)("|\')#Ui', '\\1=\\2\\3\\4.v'.$version .'.\\5\\6', $html);

But it does not work 100% and I am sure that there is a better way in doing this. How would you write it?

Edit:

I have it now using DOMDocument, but it turns out that it's pretty slow!

    <?php
//------- snip --------------
            $host       = (isset($_SERVER['HTTPS']) && $_SERVER['HTTPS'] == 'on' ? 'https' : 'http') . '://' . $_SERVER['HTTP_HOST'];
            $refs       = array();
            $version    = "v". $version;
            $doc        = new DOMDocument();        
            $tmpDoc     = $html;

            $doc->loadHTML($tmpDoc);        
            $xpath = new DOMXPath($doc);
            foreach($xpath->query('/html/head/link/@href') as $href) {
                $ref = $href->value;
                if(
                    !preg_match('/^https?:/', $ref) || 
                    strpos($ref, $host) === 0               
                ) {
                    $refs[$ref] = preg_replace('/\.css$/', '.'.$version.'$0', $ref);
                }
            }
            foreach ($xpath->query('//script/@src') as $src) {
                $ref = $src->value;
                if(
                    !preg_match('/^https?:/', $ref) ||
                    strpos($ref, $host) === 0
                ) {
                    $refs[$ref] = preg_replace('/\.js$/', '.'.$version.'$0', $ref);
                }
            }       
            $html = str_replace(
                        array_keys($refs), 
                        array_values($refs), 
                        $tmpDoc
                    );
//------- snip --------------
    ?>
+3  A: 

Don't use regex, at least not to find the src values. Use the DOM.

var allScriptTags = document.getElementsByTagName("script");
var allLinkTags = document.getElementsByTagName("link");

for(var i=0; i<allScriptTags.length; i++) {
    var srcAttribute = allScriptTags[i].getAttribute("src");

    // ... do something with srcAttribute ...

    // replace it with the modified value
    allScriptTags[i].setAttribute("src", srcAttribute);
}
Luca Matteis
Thanks for that, but I need it in PHP ;-)
Alex
I don't think the API should be much different with a PHP Dom parser.
Luca Matteis
+2  A: 

You should better use a HTML parser like Simple HTML DOM Parser or DOMDocument to find the elements/attributes. Then you can use regular expressions to modify the attribute values.

Here’s an example for DOMDocument:

$version = 'v123456';
$doc = new DOMDocument();
$doc->loadHTML($code);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('/html/head/link/@href') as $href) {
    if (!preg_match('/^https?:/', $href->value)) {
        $href->value = preg_replace('/\.css$/', '.'.$version.'$0', $href->value);
    }
}
foreach ($xpath->query('//script/@src') as $src) {
    if (!preg_match('/^https?:/', $src->value)) {
        $src->value = preg_replace('/\.js$/', '.'.$version.'$0', $src->value);
    }
}
Gumbo
well that would work too, but what's with the performance? I have to do this on every page load, and caching is no option in my case.
Alex
@Alex: I think you will have to give it a try.
Gumbo
Thanks, I tried it. The replaces it makes are correct, but it changes the structure of my page. Removes whitespace on some places, and it also removes the ending tag on some tags. <img /> gets <img> for example. Probably it does not detect my doctype correctly. I will give Simple HTML DOM Parser a try!
Alex
@Alex: How do you output the changed document?
Gumbo
I did $html = $doc->saveHTML();
Alex
Simple HTML DOM Parser does not work at all.I also tried phpQuery which also results in changed HTML code like DOMDocument...
Alex
@Alex: Well, `saveHTML` converts the document to HTML. When you’re using XHTML, you need to use `saveXML`.
Gumbo
OK when I use saveXML at least my tags keep correctly closed. But my page gets messed up anyway because it strips white space etc. Is there any way to alter my desired tags WITHOUT changing anything else? I think a regex would be the best solution since performance really matters at this point (~ 15-20 000 visits a day)
Alex
I have edited my post, please have a look :-)
Alex
+1  A: 

why not something a lot more simple like:

$version = 12345;
function print_local_css($base_filename)
{
    global $version;
    print '<link rel="stylesheet" type="text/css" href="'.$base_filename.'.v'.$version.'.css" />';
}

function print_local_js($base_filename)
{
    global $version;
    print '<script type="text/javascript" src="'.$base_filename.'.v'.$version.'.js"></script>';
}

It'll be quick, one location to change the version and any files you don't want to be versioned, just manually put the HTML in.

Mike Anchor
Unfortunately I don't have that much influence on how the document gets created, I can just go through it before outputting it to the client.
Alex