Consider the following script that contains obfuscated email addresses, and a function that attempts to replace them based with *****
by using regex pattern matching. My script attempts to catch the words: "at", "a t", "a.t", "@"
followed by some text (any domain name), followed by "dot" "." "d.o.t"
, followed by a TLD.
Input:
$str[] = 'dsfatasdfasdf asd dsfasdf [email protected]';
$str[] = 'I live at school where My address is [email protected]';
$str[] = 'I live at school. My address is [email protected]';
$str[] = 'at school my address is [email protected]';
$str[] = 'dsf a t asdfasdf asd dsfasdf [email protected]';
$str[] = 'd s f d s f a t h o t m a i l . c o m';
function clean_text($text){
$pattern = '/(\ba[ \.\-_]*t\b|@)[ \.\-_]*(.+)[ \.\-_]*(d[ \.\-_]*o[ \.\-_]*t|\.)[ \.\-_]*(c[ \.\-_]*o[ \.\-_]*m|n[ \.\-_]*e[ \.\-_]*t|o[ \.\-_]*r[ \.\-_]*g|([a-z][ \.\-_]*){2,3}[a-z]?)/iU';
return preg_replace($pattern, '***', $text);
}
foreach($str as $email){
echo clean_text($email);
}
Expected Output:
dsfatasdfasdf asd dsfasdf dsfdsf***
I live at school where My address is dsfdsf@***
I live at school. My address is dsfdsf@***
***
dsf ***
d s f d s f ***
Result:
dsfatasdfasdf asd dsfasdf dsfdsf***
I live ***
I live ***
at school my address is dsfdsf****
dsf ***
d s f d s f ***
Problem: It catches the first occurrence of "at", and not the last, so the following happens:
input: 'at school my address is [email protected]'
produces: '****'
should produce: 'at school my address is dsfdsf****'
How can I fix this?