tags:

views:

34

answers:

1

is there a way to group a matching element but not have that match appear in the resulting match groups? for example, suppose I have a string with two lines:

<td>text 1</td>
<td><a href=whatever>this is</a> text 2</td>

and I want to parse out "text 1" and "this is text 2". what I'm doing now is using this pattern:

<td>(<a href=.+?>)?(.+?(</a>)?.+?)</td>

basically grouping the anchor tags so I can have the pattern match them zero or one time. I don't want those groups to appear in the match results (though I can easily ignore them). is there a proper way to do this?

+4  A: 

You can use a non-capturing group:

(?:xxx)

A non-capturing group works like a normal group in that you can use operators on it. But it does not capture anything, and you can't use it for backreferences.

Andomar
thanks, that's what I need. but it looks like it doesn't do what I want if I nest a non-capturing group inside a capturing group...is that not possible?
toasteroven
specifically for the second example, if I match with:<td>(?:<a href=.+?>)(.+?(?:</a>).+?)</td>it doesn't properly match the </a>
toasteroven
In the regex in your comment, `a href` is not optional. Try `<td>(?:<a href=.+?>)?(.+?(?:</a>)?.+?)</td>` instead. BTW-- if you're parsing HTML, a regex is a pretty bad approach. Try this instead http://www.codeplex.com/htmlagilitypack
Andomar
thanks, I'll check that out
toasteroven