tags:

views:

58

answers:

3

I need to take an arbitrary string and make it safe to use as part of an email address, replacing characters if necessary. This is to submit text to an API that requires values to be encoded into the address the email is being sent to. However, this is fraught with pitfalls and I'd like to make sure that whatever is input isn't going to break mail delivery. I'm already planning on just double-quoting the entire local part, which is going to mitigate a lot of the issues.

I've found a guide to what characters can be used in the local part of an email address, but it seems to make no distinction between 'It's forbidden by the RFCs' and 'It can be confusing so it's best to stay away' and 'it can only be used when escaped properly'. Does anyone have a reference to something clearer/faster to read than the appropriate RFCs themselves?

Edit: I have no control over the parsing on the receiving end nor can I change how the text gets submitted as something other than a straight ASCII string.

A: 

Why don't you just confine the characters you use in the email address to the ones in the table you referenced that are marked "OK"?

That would include plus, minus, hyphen, period, upper and lower case letters, numeric digits, and the underscore.

Robert Harvey
Makes it difficult to include a string 'John Doe (ABD Admin), Jane Doe (ABD CEO)' in the email address without totally munging it then.
Oesor
Robert Harvey
Then the string is 'John.Doe.ABD.Admin'. I'm not trying to make up an email address that I can use for such and such string, I'm interacting with an API that requires that fields be passed as parts of the email address the mail is sent to. IE, the email is something like 'foo=The quick brown fox/bar=jumped over the lazy [email protected]' would theoretically assign foo and bar to those strings, respectively. Yes, it's a somewhat insane API.
Oesor
I assume that if the people who wrote the API use these characters in an email address, that they manage to do so successfully (i.e. they tested their own stuff to make sure it works).
Robert Harvey
You're far more trusting than I am.
Oesor
@Oesor: Perhaps. But if it doesn't work it's not because you aren't making your emails character-safe, it's because they didn't write their API properly. Either way, I don't see how you can fix it if it's broken.
Robert Harvey
+1  A: 

The RFCs themselves are perfectly clear. Ignoring obsolete and quoted forms, the local part of an address is:

atext       =       ALPHA / DIGIT / ; Any character except controls,
                    "!" / "#" /     ;  SP, and specials.
                    "$" / "%" /     ;  Used for atoms
                    "&" / "'" /
                    "*" / "+" /
                    "-" / "/" /
                    "=" / "?" /
                    "^" / "_" /
                    "`" / "{" /
                    "|" / "}" /
                    "~"
David M.
The table that the OP references gives some very good reasons why some of these characters should not be used.
Robert Harvey
Yeah, found rfc 3696 (http://tools.ietf.org/html/rfc3696#page-5) which does a decent job of summarizing the various rules.
Oesor
A: 

Assuming you control the API, why not Base64 encode the data?

Base64EncodeString("Hello")
Base64DecodeString("SGVsbG8=")

You should probably replace the padding character = with an email-safe character like - minus.

Edit
It seems = is a safe character, no need to replace it.
However, the resulting text may start with a number so pad it with a letter and have the receiver discard the padding.

Greg
That will work if you don't mind having unreadable email addresses.
Robert Harvey
That would be great if the receiving end could take base64 encoded strings.
Oesor
@Robert - Who reads their email anyway? :)
Greg