ansaurus

Question

Answer 1

+1 A:

Just change the expression to non-greedy and reverse the match order:

Dim reg As New Regex("\s\(.+?\)</P>", RegexOptions.IgnoreCase Or RegexOptions.RightToLeft)

Or make it match only one closing parenthesis:

"\s\([^)]+\)</P>"

Or make it match only numbers inside your pharentesis:

"\s\(\d+\)</P>"

Edit: in order to make the non-greedy sample to work, you'll need to set the RightToLeft flag on the Regex object

Fábio Batista 2010-04-08 21:32:37

I made that change, but the parsing through the text: "....(123) (321)</p>" still returns "(123) (321)</p>"

Matt H. 2010-04-08 21:37:58

.. i just want it to return "(321)</p>"

Matt H. 2010-04-08 21:38:21

Check again my suggestions, I edited a little bit. The non-greedy method still works, but need an extra flag on RegexOptions.

Fábio Batista 2010-04-08 21:50:06

Answer 2

+1 A:

Dim reg As New Regex("\s\(\d+\)</P>", RegexOptions.IgnoreCase)

Your stumbling block was the insufficient specificity of the . (it matches all characters, including parentheses) and the greediness of the + (it matches as much as possible).

Just be more specific (\d+) or less greedy (.+?).

Tomalak 2010-04-08 21:34:25

"Less greedy" won't work; RE engines always try to start matching as soon as possible. Being more specific is the correct approach.

Donal Fellows 2010-04-08 21:48:06

I just edited my answer with non-greedy suggestion, it actually works, but you'll have to set the Regex engine to work backwards (the RegexOptions.RightToLeft on .NET will do this)

Fábio Batista 2010-04-08 21:49:38

Answer 3

A:

You need to use a Look Ahead (?= ) to anchor the pattern. That gives a hint to the parser of where the data should stop, be anchored to. Here is an example which gets the previous ( ) data from the p tag anchor point:

(?:\()([^)]+)(?:\))(?=</[pP]>)


(?:\()        - Match but don't capture a (
([^)]+)       - Get all the data until a ) is hit. [^ ] is the not set
(?:\))        - Match but don't capture the )  
(?=</[pP]>)  - Look Ahead Match but don't capture a suffix of </p or P >

HTH

OmegaMan 2010-04-09 18:37:12

ansaurus

tags:

views:

answers:

Regular Expression Help in .NET

related questions