views:

31

answers:

1

Is it possible to exclude <pre> tags from this code igniter compression hook? I don't understand regular expressions well enough to not break my page. I have tried, but it always jacks up the output.

EDIT: This CodeIgniter Compression hook strips all unecisary white space and formatting from the code in order to compress the output. Including <pre> tags that rely on that spacing and formatting to display the code right.

I'm trying to show code examples in a compressed output page.

<?php  if ( ! defined('BASEPATH')) exit('No direct script access allowed');

function compress()
{
    $CI =& get_instance();
    $buffer = $CI->output->get_output();

     $search = array(
        '/\n/',
        '/\>[^\S ]+/s',
        '/[^\S ]+\</s',
        '/(\s)+/s'
      );

     $replace = array(
        ' ',
        '>',
        '<',
        '\\1'
      );

    $buffer = preg_replace($search, $replace, $buffer);

    $CI->output->set_output($buffer);
    $CI->output->_display();
}

?>
+2  A: 

Let's start by looking at the code you're using now.

 $search = array(
    '/\n/',
    '/\>[^\S ]+/s',
    '/[^\S ]+\</s',
    '/(\s)+/s'
  );

 $replace = array(
    ' ',
    '>',
    '<',
    '\\1'
  );

The intention appears to be to convert all whitespace characters to simple spaces, and to compress every run of multiple spaces down to one. Except it's possible for carriage-returns, tabs, formfeeds and other whitespace characters to slip through, thanks to the \\1 in the fourth replacement string. I don't think that's what the author intended.

If that code was working for you (aside from matching inside <pre> elements), this would probably work just as well, if not better:

$search = '/(?>[^\S ]\s*|\s{2,})/`;

$replace = ' ';

And now we can add a lookahead to prevent it from matching inside <pre> elements:

$search = 
  '#(?>[^\S ]\s*|\s{2,})(?=(?:(?:[^<]++|<(?!/?pre\b))*+)(?:<pre>|\z))#`;

But really, this is not the right tool for the job you're doing. I mean, look at that monster! You'll never be able to maintain it, and complicated as it is, it's still nowhere near as robust as it should be.

I was going to urge you to drop this approach and use a dedicated HTML minifier instead, but that one seems to have its own problems with <pre> elements. If that problem has been fixed, or if there's another minifier out there that would meet your needs, you should definitely go that route.

Alan Moore
that is a monster, but it works for my small application, thanks a ton you regex ninja.
jon