views:

41

answers:

2

Hello, I am a bit puzzled with my Regex results (and still trying to get my head around the syntax). I have been using http://regexpal.com/ to test out my expression, and its works as intended there, however in C# its not as expected.

Here is a test - an expression of the following: (?=<open>).*?(?=</open>)

on an input string of: <open>Text 1 </open>Text 2 <open>Text 3 </open>Text 4 <open>Text 5 </open>

I would expect a result back of <open>Text1 <open>Text 2 <open>Text 3... etc

However when I do this in C# it only returns the first match of <open>Text1

How do I get all five 'results' back from the Regex?

    Regex exx = new Regex("(?=<open>).*?(?=</open>)", RegexOptions.IgnoreCase | RegexOptions.Singleline);
    string input = "<open>Text 1</open> Text 2 <open> Text 3 </open> Text 4 <open> Text 5 </open>";
    string result = Regex.Match(input, exx.ToString(), exx.Options).ToString(); 
+1  A: 

Use Regex.Matches instead of Regex.Match.

PS Home:> $s = '<open>Text 1 </open>Text 2 <open>Text 3 </open>Text 4 <open>Text 5 </open>'
PS Home:> $re = '(?=<open>).*?(?=</open>)'
PS Home:> @([regex]::Match($s, $re)).Length
1
PS Home:> @([regex]::Matches($s, $re)).Length
3

As the documentation for Regex.Match states:

Searches an input string for a substring that matches a regular expression pattern and returns the first occurrence as a single Match object.

whereas for Regex.Matches:

Searches an input string for all occurrences of a regular expression and returns all the successful matches.

Note: What you're doing here seems very wrong. If what you're dealing with is XML or a similar language, then please don't use regular expressions to parse it. You'll get mad otherwise with nested structures.

Joey
Wow, that easy, thanks, got it work!
AaronM
In that case, you can improve your karma by upvoting and accepting Johannes' answer (see the up-triangle and checkbox next to this post?
Tim Pietzcker
I missed the Matches option. Thanks for that link as well, I am doing some basic HTML parsing/scraping, nothing too complex (I think...) I was using a for loop and trawling through the string byte by byte, but thought a Regex would be better (its certainly a lot less code!), Ill have a good read through that question though.
AaronM
I can accept the answer (and thanks for pointing me to the place, I was looking for some text to click on, not an image!), but I cant upvote unless I become a member :(
AaronM
@Aaron: For HTML scraping you can use the HTML Agility Pack (http://www.codeplex.com/htmlagilitypack) which should be a lot more robust than using regular expressions.
Joey
@Aaron: Two more rep points to go, and you'll be able to upvote :)
Tim Pietzcker
A: 

Do you really want to have <open> at the start of every match? Why not use lookbehind, too?

(?<=<open>).*?(?=</open>)
Tim Pietzcker
Ah, good point. It looks bad in the test data I used, but the real data I am parsing the opening tag can be helpful. Thanks though, Regex is all new to me, but will take a bit to get used to. Now Johannes has given me something else to look at as well!
AaronM