tags:

views:

101

answers:

4

I am trying to remove the <br /> tags that appear in between the <pre></pre> tags. My string looks like

string str = "Test<br/><pre><br/>Test<br/></pre><br/>Test<br/>---<br/>Test<br/><pre><br/>Test<br/></pre><br/>Test"

string temp = "`##`";
while (Regex.IsMatch(result, @"\<pre\>(.*?)\<br\>(.*?)\</pre\>", RegexOptions.IgnoreCase))
{
    result = System.Text.RegularExpressions.Regex.Replace(result, @"\<pre\>(.*?)\<br\>(.*?)\</pre\>", "<pre>$1" + temp + "$2</pre>", RegexOptions.IgnoreCase);
}
str = str.Replace(temp, System.Environment.NewLine);

But this replaces all <br> tags between first and the last <pre> in the whole text. Thus my final outcome is:

str = "Test<br/><pre>\r\nTest\r\n</pre>\r\nTest\r\n---\r\nTest\r\n<pre>\r\nTest\r\n</pre><br/>Test"

I expect my outcome to be

str = "Test<br/><pre>\r\nTest\r\n</pre><br/>Test<br/>---<br/>Test<br/><pre>\r\nTest\r\n</pre><br/>Test"
+2  A: 

Don't use regex to do it.

"Be lazy, use CPAN and use HTML::Sanitizer." -Jeff Atwood, Parsing Html The Cthulhu Way

C#PAN? :) [ ](http://.)
KennyTM
+3  A: 

If you are parsing whole HTML pages, RegEx is not a good choice - see here for a good demonstration of why.

Use an HTML parser such as the HTML Agility Pack for this kind of work. It also works with fragments like the one you posted.

Oded
I am just trying to parse the above string that i mentioned in str.
Ashish
That is not answer for the question. That guy asked for regex and nothing more.
DixonD
@DixonD, Ted asked the wrong question.
jasonbar
A: 
        string input = "Test<br/><pre><br/>Test<br/></pre><br/>Test<br/>---<br/>Test<br/><pre><br/>Test<br/></pre><br/>Test";
        string pattern = @"<pre>(.*)<br/>(([^<][^/][^p][^r][^e][^>])*)</pre>";
        while (Regex.IsMatch(input, pattern))
        {
            input = Regex.Replace(input, pattern, "<pre>$1\r\n$2</pre>");
        }

this will probably work, but you should use html agility pack, this will not match <br> or <br /> etc.

Kikaimaru
A: 

Ok. So I discovered the issue with my code. The problem was that, Regex.IsMatch was considering just the first occurrence of <pre> and the last occurrence of </pre>. I wanted to consider individual sets of <pre> for replacements. So I modified my code as

foreach (Match regExp in Regex.Matches(str, @"\<pre\>(.*?)\<br\>(.*?)\</pre\>", RegexOptions.IgnoreCase)) 
{
    matchFound = true;
    str = str.Replace(regExp.Value, regExp.Value.Replace("<br>", temp));
}

and it worked well. Anyways thanks all for your replies.

Ashish