Ok, so I have this regex:
( |^|>)(((((((\+|00)(31|32)( )?(\(0\))?)|0)([0-9]{2})(-)?( )?)?)([0-9]{7}))|((((((\+|00)(31|32)( )?(\(0\))?)|0)([0-9]{3})(-)?( )?)?)([0-9]{6}))|((((((\+|00)(31|32)( )?(\(0\))?)|0)([0-9]{1})(-)?( )?)?)([0-9]{8})))( |$|<)
It formats Dutch and Belgian phone numbers (I only want those hence the 31 and 32 as country code).
Its not much fun to decipher but as you can see it also has a lot duplicated. but now it does handles it very accurately
All the following European formatted phone numbers are accepted
0031201234567
0031223234567
0031612345678
+31(0)20-1234567
+31(0)223-234567
+31(0)6-12345678
020-1234567
0223-234567
06-12345678
0201234567
0223234567
0612345678
and the following false formatted ones are not
06-1234567 (mobile phone number in Holland should have 8 numbers after 06 )
0223-1234567 (area code with home phone)
as opposed to this which is good.
020-1234567 (area code with 3 numbers has 7 numbers for the phone as opposed to a 4 number area code which can only have 6 numbers for phone number)
As you can see it's the '-' character that makes it a little difficult but I need it in there because it's a part of the formatting usually used by people, and I want to be able to parse them all.
Now is my question... do you see a way to simplify this regex (or even improve it if you see a fault in it), while keeping the same rules?
You can test it at regextester.com
(The '( |^|>)' is to check if it is at the start of a word with the possibility it being preceded by either a new line or a '>'. I search for the phone numbers in HTML pages.)