How do I remove everything beginning in '<' and ending in '>' from a string in C#. I know it can be done with regex but I'm not very good with it.
+1
A:
The tag pattern I quickly wrote for a recent small project is this one.
string tagPattern = @"<[!--\W*?]*?[/]*?\w+.*?>";
I used it like this
MatchCollection matches = Regex.Matches(input, tagPattern);
foreach (Match match in matches)
{
input = input.Replace(match.Value, string.Empty);
}
It would likely need to be modified to correctly handle script or style tags.
Anthony Pegram
2010-04-09 19:28:05
Worked like a charm
Nick Brooks
2010-04-10 20:24:35
`[!--\W*?]` means "Match a character in the range between `!` and `-`, a non-word character, a `*` or a `?`". Since that group is optional, it doesn't hurt, but it doesn't fulfil the obviously intended purpose of a negative lookahead (which would be `(?!--)`, the `\W*?` and the following `*?` don't make any sense at all).
Tim Pietzcker
2010-05-18 13:58:40
+1
A:
Non regex option: But it still won't parse nested tags!
public static string StripHTML(string line)
{
int finished = 0;
int beginStrip;
int endStrip;
finished = line.IndexOf('<');
while (finished != -1)
{
beginStrip = line.IndexOf('<');
endStrip = line.IndexOf('>', beginStrip + 1);
line = line.Remove(beginStrip, (endStrip + 1) - beginStrip);
finished = line.IndexOf('<');
}
return line;
}
sweaver2112
2010-04-09 19:41:18
A:
Can you give us an example of what you have and what the final output should be?
OmegaMan
2010-04-09 20:34:17