tags:

views:

23

answers:

1

I need to write a regular expression for the following (NB. ignore carriage returns, I've added them for readability):

<strong>Contact details</strong>
<p><label>Office:</label>&nbsp;+44 (0)12 3456 7890<br />
<label>Direct:</label>&nbsp;+44 (0)12 3456 7890<br />
<label>Mobile:</label>&nbsp;+44 (0)1234 567890<br />
<label>E-mail:</label>&nbsp;<a href="mailto:[email protected]">[email protected]</a><br />

I am using

/([\+\d\(\)\s]+)/

Which matches the number blocks and I can use and offset of 0-2 to identify them. The problem is it is returning white space as well which is screwing up my offsets. How do I say "it must contain at least one digit in the match"?
I did also try

/\<label\>Office:\<\/label\>&nbsp;([\+\d\(\)\s]+)\<br \/\>/

But that would return

+44 (0)12 3456 7890<br />
<label>Direct:</label>&nbsp;+44 (0)12 3456 7890<br />
<label>Mobile:</label>&nbsp;+44 (0)1234 567890<br />
<label>E-mail:</label>&nbsp;<a href="mailto:[email protected]">[email protected]</a>
+1  A: 

Its not a good idea to parse HTML using regex, use a DOM bases parse instead.

Your regex does not work because its greedy, to make it non-greedy change

([\+\d\(\)\s]+)

to

([\+\d\(\)\s]+?)

Also +, ( and ) will be treated literally in a char class. So no need to escape them:

([+\d()\s]+?)
codaddict
Thanks. I do use simple_html_dom to get this far but I need regexp for the final step of pulling out the number. Thanks for the tips.
Simon