tags:

views:

49

answers:

2

hi, i have this pattern i using to replace string:

var html = "some test string";
var regex = new Regex(@"<(.|\n)+?>", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Multiline);
var result = regex.Replace(html, ?);

this pattern matches all html tags <anything here> and replace with ?. actually ? is " " or "" according to match type. for example if i using below html markup:

<a href="www.google.com">Google</a><a href="www.yahoo.com">Yahoo!</a>

result is something like below:

Google?Yahoo! (here ? should be " ")

and if i using below html markup:

Buy it now for <b>$279</b><b>.99</b>!

result is something like below:

Buy it now for ?$279??.99?! (and here ? should be "")

can anybody help to improve this pattern to works properly? thanks in advance

UPDATE

OK, actually i not found an approach to do that i need, so I'm using MatchEvaluator to detect where ? should be "" and where " "! thanks a lot of ;)

+3  A: 

Try this for your Regex:

Regex r = new Regex(@"<(.|\n)*?>", RegexOptions.IgnoreCase | RegexOptions.Singleline);

And check your options, there's no need to combine singleLine and MultiLine.

Russ C
but if exists any break-line on HTML, then nothing matches
Sadegh
It shouldn't if it's RegexOptions.SingleLine, it changes the way that the . behaves.Give it a try without either of the options, it really does works as I copied the expression from live code we use.
Russ C
oh, i'm sorry that work's ;)
Sadegh
Glad I could help!
Russ C
There's no need for the `IgnoreCase` modifier either (no letters in the regex). And the `Singleline` modifier allows `.` to match linefeeds, so you can get rid of that `(.|\n)` obscenity. ;)
Alan Moore
+1  A: 

You can use RegEx Coach (http://www.weitz.de/regex-coach/) or http://gskinner.com/RegExr/ (a online tool) to test your regular expressions and get a feeling for them.

Sebastian
thanks but there isn't better than http://regexhero.net/tester/ an online RegEx tester tools on the web ;)
Sadegh