views:

41

answers:

1

I'm not validating emails. What I want to do is find (and then change) 3 separate types of "email" content in a (html) string:

  1. a plain email: eg [email protected]
  2. a mailto href: eg <a href="mailto:[email protected]">[email protected]</a>
  3. an aliased href: eg <a href="mailto:[email protected]">user's email</a>

I'm then going to transform each example into a custom html string that will then be modified by JS (anti-spam harvesting via Spamspan):

<span class="spamspan">
<span class="u">user</span>
@
<span class="d">example.com</span>
(<span class="t">Spam Hater</span>)
</span>

So you can see I also have to find these types of input, parse the email into user, domain and (optionally) a display value. I'm struggling at the moment with regexes to find these emails... parsing them should be straightfoward in PHP.

Edit: At the moment, I'm locked into PHP4. Will take a look at http://php-html.sourceforge.net/ for parsing HTML.

+1  A: 

You need a HTML parser and an email regex.

Ignacio Vazquez-Abrams
I had to use http://php-html.sourceforge.net/ (which is what simplehtmldom is based off) due to the server running PHP4 (alas!). Key points: preg_match_all(), substr_replace() and some regexes.
starmonkey
Just ran into a bug - my regex for "plain" emails means that emails inside form fields are converted... I'll need to skip these :)
starmonkey
My solution was to use a regex to pull input fields out of the string and replace them with a string "token" (which is left alone by email regexes), then re-sub the original content back in after my email processing is completed.I really need to upgrade the server to PHP5! :)
starmonkey