tags:

views:

101

answers:

3

How do I remove everything beginning in '<' and ending in '>' from a string in C#. I know it can be done with regex but I'm not very good with it.

+1  A: 

The tag pattern I quickly wrote for a recent small project is this one.

string tagPattern = @"<[!--\W*?]*?[/]*?\w+.*?>";

I used it like this

MatchCollection matches = Regex.Matches(input, tagPattern);
foreach (Match match in matches)
{
    input = input.Replace(match.Value, string.Empty);
}

It would likely need to be modified to correctly handle script or style tags.

Anthony Pegram
Worked like a charm
Nick Brooks
`[!--\W*?]` means "Match a character in the range between `!` and `-`, a non-word character, a `*` or a `?`". Since that group is optional, it doesn't hurt, but it doesn't fulfil the obviously intended purpose of a negative lookahead (which would be `(?!--)`, the `\W*?` and the following `*?` don't make any sense at all).
Tim Pietzcker
+1  A: 

Non regex option: But it still won't parse nested tags!

public static string StripHTML(string line)
        {
            int finished = 0;
            int beginStrip;
            int endStrip;

            finished = line.IndexOf('<');
            while (finished != -1)
            {
                beginStrip = line.IndexOf('<');
                endStrip = line.IndexOf('>', beginStrip + 1);
                line = line.Remove(beginStrip, (endStrip + 1) - beginStrip);
                finished = line.IndexOf('<');
            } 

            return line;
        }
sweaver2112
A: 

Can you give us an example of what you have and what the final output should be?

OmegaMan