tags:

views:

34

answers:

1

Hi,

I'm having problems with regular expressions that I got from regexlib. I am trying to do a preg_replace() on a some text and want to replace/remove email addresses and URLs (http/https/ftp).

The code that I am have is:

$sanitiseRegex = array(
    'email' => /'^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$/',
    'http' => '/^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*$/',        
);

$replace = array(
    'xxxxx',
    'xxxxx'
);

$sanitisedText = preg_replace($sanitiseRegex, $replace, $text);

However I am getting the following error: Unknown modifier '/' and $sanitisedText is null.

Can anyone see the problem with what I am doing or why the regex is failing?

Thanks

+1  A: 

For a start, your email string is opened incorrectly:

'email' => /'^([a-zA-Z0-9_\-\.
// should be
'email' => '/^([a-zA-Z0-9_\-\.

The other problem is that you are using / as a character to match and using it the start/end your URL regex, without escaping them in the regex. The simplest solution to simply use a different character to denote start/end of the regex, ie:

'http' => '@^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*$@'

What is happening is that it sees '^(http|https|ftp)\:' as the regex, then starts looking for options. The first character after the 'end' of the regex is another '/' which is an invalid option, hence the error message.

EDIT: Something quick that might fix the issue re: not matching. You could try the following instead:

'http' => '@^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?(/[a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~]*)?$@'
Matthew Scharley
thanks the first one I should of spotted.. however the second looking for the url using the http regex fails to find http://www.google.com ?
Grant Collins
Then it's an issue with your regex. It's too complicated for me to dig into, but that's where the issue lies.
Matthew Scharley
Fixed it another typo... need sleep. Thanks again Matthew
Grant Collins