tags:

views:

82

answers:

5
+1  Q: 

Regex with < and >

ok i have a file that may or may not be newlined or carriage fed. frankly i need to ignore that. I need to search the document find all the < and matching > tags and remove everything inside them. I've been trying to get this to work for a bit my current regex is:

private Regex BracketBlockRegex = new Regex("<.*>", RegexOptions.Singleline);
....
resultstring = BracketBlockRegex.Replace(filecontents, "");

but this doesn't seem to be working because it catches WAY to much. any clues? is there something wierd with the < and > symbols in c#?

+7  A: 

Replace

<.*>

with

<.*?>
Sbm007
+1  A: 

Try:

private Regex BracketBlockRegex = new Regex("<.*?>", RegexOptions.Singleline);
jenningj
+4  A: 

Try a non-greedy variant of your regex:

<[^>]*>

What you have, <.*>, will match the first < followed by everything up to the last >, whereas what you want is to match to the first one.

RichieHindle
bingo! this got it to work
Arthur
+2  A: 

Regular expressions are greedy and you've got a period which equates to ANYTHING which just so happens to include the greater than and less than characters.

Try this...

<[^<>]*>

Arguably the best Regular Expression resource on the Internet.

CptSkippy
nope. this doesn't work. have to remember i do need to consider the newline characters and line feeds as possibly inclusive to the match.
Arthur
@Arthur: It is inclusive of carriage returns and line feeds. Did you mean exclusive? If that's the case then you'd want <[^<>\r\n]*>
CptSkippy
A: 
Walt Stoneburner