tags:

views:

39

answers:

1

Hi,

I've got a challenge that I am hoping that the SO community is able to help me with.

I trying to parse a lot of html documents in my PHP application to remove personal details, such as names, addresses and phone numbers. I can remove most of these details without too much trouble, however the phone number is a real problem for me.

My idea is to take the text from these documents and the use a regex to identify the phone numbers and replace them with another value such as 'xxxx'.

I've got 2 regex that I am using one for UK landline numbers and one for UK cell/mobile numbers.

However when I try and run them against the text it just returns an empty string.

I am using the following preg_replace code:

$pattens = array(
        '/^(((\+44\s?\d{4}|\(?0\d{4}\)?)\s?\d{3}\s?\d{3})|((\+44\s?\d{3}|\(?0\d{3}\)?)\s?\d{3}\s?\d{4})|((\+44\s?\d{2}|\(?0\d{2}\)?)\s?\d{4}\s?\d{4}))(\s?\#(\d{4}|\d{3}))?$/',
        '/^(\+44\s?7\d{3}|\(?07\d{3}\)?)\s?\d{3}\s?\d{3}$/'
    );

$replace = array('xxxxx', 'xxxxx');

//do the search for the numbers.
$updatedContents = preg_replace($pattens, $replace, $htmlContents);

At the moment this is causing me a lot of head scratching as I thought that I had this nailed, but at the moment I can't see what's wrong??

I am sure that it is something really simple.

Thanks,

Grant

+2  A: 

You probably don't want to anchor your regular expressions. Remove the ^ from the beginning and the $ from the end.

Mark Byers
just done one test and it seems to of worked, well didn't strip out all of the text from the contents. I will be doing some more testing to be sureThanks Mark for you help so far
Grant Collins