tags:

views:

66

answers:

1

I have HTTP module that compresses HTTP request.

public override void Write(byte[] buffer, int offset, int count)
{
    byte[] data = new byte[count];
    Buffer.BlockCopy(buffer, offset, data, 0, count);
    string html = System.Text.Encoding.Default.GetString(buffer);

    Regex reg = new Regex(@"(?<=[^])\t{2,}|(?<=[>])\s{2,}(?=[<])|(?<=[>])\s{2,11}(?=[<])|(?=[\n])\s{2,}");
    html = reg.Replace(html, string.Empty);

    byte[] outdata = System.Text.Encoding.Default.GetBytes(html);
    _sink.Write(outdata, 0, outdata.GetLength(0));
}

How can I escape all inline scripts? This is my scripts regex.

Regex reg = new Regex("<script[^>]*?>[\\w|\\t|\\r|\\W]*?</script>", (RegexOptions.Singleline | RegexOptions.IgnoreCase));
A: 

I really doubt that you want to be using Encoding.Default, which is specific to your system. Aside from that, why not use the more standard way of doing compression, namely gzip the binary data instead of doing text manipulation? I suspect that will have more impact, and isn't nearly as fragile in terms of accidentally breaking the HTML. In addition, you won't need to worry about the encoding any more.

Jon Skeet
I already have GZip using IIS. Thanks
If you're already gzipping, is this really making a significant difference? Have you measured it, along with the performance hit it's taking to do the encoding/decoding/regex replacement? And then there's the risk of getting it slightly wrong...
Jon Skeet
I agree with your comment. It is not improving performance, however we are using it for SEO optimization.
Closer html elements to <body> more accurate content is. All our JS and CSS loaded using HTTP Handler and it is working fine. Some sections of the website we are loading using HTTP Proxy and HTML content have JS inline.
We don't have control over this HTML. If we compress this request using HTTP Module we are getting JS errors. So now we had to disable HTTP module just because of 3rd party HTML. I am trying to modify our HTTP Module to skip inline JS and I was looking for help to update our RegEx. Thanks in advance
In the meanwhile I will try to contact 3rd party to provide us with all their JS functions and ask them to remove all inline functions.