tags:

views:

68

answers:

5

I want match all phone numbers that are wrapped between << and >> tags.
This regex for phone numbers:

0[2349]{1}\-[1-9]{1}[0-9]{6}

I tired to add lookahead (and lookbehind) like (?=(?:>>)) but this didn't work for me.

DEMO

A: 

<<0[2349]{1}-[1-9]{1}[0-9]{6}>>

MrFox
have you checked the demo? it's must support multi line and multi white spaces.
shivesh
A: 

Hey,

I placed a similar question some time ago, using brackets ([]) instead of <<>>:

Link here

This should really help Cheers

Edit: It should support your demo no problem.

Hal
I think i have a different scenario, I need to match a pattern inside another pattern
shivesh
A: 

This can easily be done with two regex patterns:

To identify the section:

<<.*>>

Use the second regex on the matches from the first:

0[2349]-[1-9]\d{6}

Remember to set dot to match new line. I know it isn't exactly what you were asking, but it will work.

Arkain
Is it possible to do it with just one regex?
shivesh
@shivesh, as the accepted answer shows it is possible to do in one regex, but it also show the biggest problem with regex, they very easily become very unreadable and hard to maintain. Unless it is strictly necessary I usually spit it up into smaller, easier to understand patterns.
Arkain
A: 

I think gnarf's (and Arkain's) suggestion is very sensible – you don't have to use one regex to do all the work.

But, if you really want to use one hard-to-read unportable (works only in .Net, not in other regex engines) regex, here you go:

(?<=<<(?:>?[^>])*)0[2349]{1}\-[1-9]{1}[0-9]{6}(?=(?:<?[^<])*>>)
svick
if I run regex 2 times on the input text isn't it less efficient than 1 regex?
shivesh
Maybe, but most of the time, readability and maintainability is more important than some minor difference in efficiency.Also, the lookaheads and lookbehids I used can be quite inefficient, so using two regexes may actually be faster too.
svick
A: 

The following seems to work (as seen on ideone.com):

Regex r = new Regex(@"(?s)<<(?:(?!>>)(?:(0[2349]\-[1-9][0-9]{6})|.))*>>");

Each <<...>> section is a Match, and all phone numbers in that section will be captured in Group[1].Captures.

Related questions


How the pattern is constructed

First of all, I simplified your phone number pattern to:

0[2349]\-[1-9][0-9]{6}

That is, the {1} is superfluous, so they get thrown away (see Using explicitly numbered repetition instead of question mark, star and plus).

Then, let's try to match each <<...>> section. Let's start at:

(?s)<<((?!>>).)*>>

This will match each <<..>> section. The .* to capture the body is guarded by a negative lookahead (?!>>), so that we don't go out of bound.

Then, instead of matching ., we give priority to matching your phone number instead. That is, we replace . with

(phonenumber|.)

Then I simply made some groups non-capturing, and the phone number captures to \1 and that's pretty much it. The fact that .NET regex stores all captures made by a group in a single match took care of the rest.

References

polygenelubricants