tags:

views:

73

answers:

1

I have developed a wordpress plugin that looks through a bunch of html and detects any email address, replacing them by a non harvestable html markup (to be rerendered as an email via javascript for better usability).

So for instance, the function receives:

$content = "Hello [email protected]. How are you today?";

and outputs:

$content = "Hello <span class="email">john(replace this parenthesis by @)example.com</span>. How are you today?";

My function works fine, but i would like now, to give an option to specify what the readable email should be like. So if the function receives:

$content = "Hello [email protected](John Doe). How are you today?";

the new output would be:

$content = "Hello <span class="email" title="John Doe">john(replace this parenthesis by @)example.com</span>. How are you today?";

So the regex should look for attached parenthesis, and if found, take what's inside and add a html title attribute, remove the parenthesis, and then parse the email.

I'm pretty much clueless as to how to make it happen, because of the optional nature of the feature (meaning: those parenthesis won't always be there).

Any pointer would be helpful, here is my current code:

function pep_replace_excerpt($content) {
     $addr_pattern = '/([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,4})/i';
        preg_match_all($addr_pattern, $content, $addresses);
        $the_addrs = $addresses[0];
        for ($a = 0; $a < count($the_addrs); $a++) {
            $repaddr[$a] = preg_replace($addr_pattern, '<span class="email" title="$4">$1(replace this parenthesis by @)$2.$3</span>', $the_addrs[$a]);
        }
        $cc = str_replace($the_addrs, $repaddr, $content);
        return $cc;
}
+1  A: 

Easy option might be checking with strpos for presence of parenthesis just after e-mail and then using regex to find first occurrence of ((.+?)) after the e-mail.

The other option is to add ((.+?))? to your regex, the last question mark will make the group optional.

Then the fool code will look like:

function pep_replace_excerpt($content) {
     $addr_pattern = '/([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,4})(\((.+?)\))?/i';
        preg_match_all($addr_pattern, $content, $addresses);
        $the_addrs = $addresses[0];
        for ($a = 0; $a < count($the_addrs); $a++) {
            if(count($the_addrs[$i]) == 4)
                $repaddr[$a] = preg_replace($addr_pattern, '$1(replace this parenthesis by @)$2.$3', $the_addrs[$a]);
            else
                $repaddr[$a] = preg_replace($addr_pattern, '$1(replace this parenthesis by @)$2.$3', $the_addrs[$a]);
        }
        $cc = str_replace($the_addrs, $repaddr, $content);
        return $cc;
}
raceCh-
Very nice! One thing bothers me: $4 return the parenthis alongside. Is there a way to remove them right inside the regex? Otherwise i suppose i can fix that thru php functions.
pixeline
thanks :) i edited the regex pattern so that the group is only the text inside the parenthesis.
raceCh-
mmh, does not work too well: it seems to fail with the closing parenthesis. i'll keep trying... HEre is a copy/pastable code: http://phpbin.net/x/196328853
pixeline
Okay, quick fix (I can't check php right now, since i'm on the phone) - change (.*?) to (^\)), i'll change that in solution.
raceCh-
I found it! I'll update your answer and accept it. thanks a lot!
pixeline
You're welcome and thanks for accepting :)
raceCh-