views:

126

answers:

2

I have overriden Sharepoint page's Render method to cut out some script tag from the html sent to client browser like this:

protected override void Render(HtmlTextWriter originalWriter)
    {   
        string content = string.Empty;
        using (StringWriter stringWriter = new StringWriter())
        {
            using (HtmlTextWriter htmlWriter = new HtmlTextWriter(stringWriter))
            {
                //render the page to my temp writer
                base.Render(htmlWriter);
                htmlWriter.Close();
                //get page content that would normally be sent to client
                content = stringWriter.ToString();
                stringWriter.Close();
            }
        }
        //replace the script tag
        Regex regex = new Regex(@"<script>.*RTE_ConvertTextAreaToRichEdit.*<"+"/script>");
        content = regex.Replace(content, string.Empty);

        //write modified html to the original writer
        originalWriter.Write(content);
    }

After this change something strange happened: a part of page that usually is in the upper-right corner and says "Welcome XXX" is not displayed properly. When I view the source of the page, this text is writter BEFORE HTML tag - before any html starts. I can't figure out what is going on for last two days.
Have you got any ideas, has anyone had similar problem?

+2  A: 

Have you checked your Regex? Regex are greedy. This means that by default it returns the longest match possible.

So if your HTML looks something like this:

<html>
   ...
   <!-- first script element -->
   <script>...RTE_ConvertTextAreaToRichEdit...</script>
   <!-- first script element ends -->

   <!-- second script element -->
   <script>...</script>
   <!-- second script element ends -->
   ...
</html>

The Regex matches all the stuff between the start of the first script element and the end of the second script element. After the replace your output should be:

<html>
   ...
   <!-- first script element -->
   <!-- second script element ends -->
   ...
</html>

You can turn your Regex in an ungreedy or lazy one (find smallest possible match). Add a ? after the * and that should do it:

Regex regex = new Regex(@"<script>.*?RTE_ConvertTextAreaToRichEdit.*?</script>");

This might solve the problem. Look here for more info.

spa
well this might be it, i don't think so, but it is a good guess anyway. i'll check it
agnieszka
oh yes, i've forgotten that the problem persists even if i do not replace anything (don't use regex at all) - just render the page to my temp writer and rewrite its content to the original writer. this is really odd. so it's probably not what you've mentioned
agnieszka
Have you checked where the expected HTML fragment which gets outputted by the render method ends up? Is it at the right place?
spa
+2  A: 

You may have some luck using the HTML agility pack. HTML Parsers are better at... parsing... html than regexs are.

http://www.codeplex.com/htmlagilitypack

Aidan