Certain mail clients allow for the sender to place images directly in the body of their email (instead of as a traditional attachment). When I receive one of these emails in my application, I need to be able to look at only the text/plain
message body and determine that the sender embedded an inline image.
I'm trying to craft a RegEx to find image placeholders in the text/plain
message body so I can swap them for <img>
tags in my own HTML-enabled version of the message. (Wacky, I know, but this is the requirement).
The problem I'm finding is that the placeholders differ based on the sending mail client. For example, when sent from MS Outlook, the text/plain
body of the multi-part message looks like this:
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Check out this image:
[cid:[email protected]]
Isn't it cool??
A similar message sent from Gmail is a little bit different:
Content-Type: text/plain; charset=ISO-8859-1
Check out this image:
[image: image001.jpg]
Isn't it cool??
The text/html
body and image/jpeg
part with the base64 encoded image follow.
Has anyone done any research on this before and compiled a list or built a RegEx specifically for this purpose?
I realize a more reliable way to achieve my goal is to look at the text/html
portion of the message--which seems to be a bit more standardized from the few tests I've done--but unfortunately I don't have access to that in this scenario.
I'm using C#, if that matters to anyone.
Here's a list of text/plain image placeholders I've compiled thus far:
- Gmail:
[image: filename.jpg]
- Outlook 2007:
[cid:[email protected]]
- Thunderbird 3.0.7:
none