views:

837

answers:

1

The IMAP specification (RFC 2060, 5.1.3. Mailbox International Naming Convention) describes how to handle non-ASCII characters in folder names. It defines a modified UTF-7 encoding:

By convention, international mailbox names are specified using a modified version of the UTF-7 encoding described in [UTF-7]. The purpose of these modifications is to correct the following problems with UTF-7:

  1. UTF-7 uses the "+" character for shifting; this conflicts with the common use of "+" in mailbox names, in particular USENET newsgroup names.

  2. UTF-7's encoding is BASE64 which uses the "/" character; this conflicts with the use of "/" as a popular hierarchy delimiter.

  3. UTF-7 prohibits the unencoded usage of "\"; this conflicts with the use of "\" as a popular hierarchy delimiter.

  4. UTF-7 prohibits the unencoded usage of "~"; this conflicts with the use of "~" in some servers as a home directory indicator.

  5. UTF-7 permits multiple alternate forms to represent the same string; in particular, printable US-ASCII chararacters can be represented in encoded form.

In modified UTF-7, printable US-ASCII characters except for "&" represent themselves; that is, characters with octet values 0x20-0x25 and 0x27-0x7e. The character "&" (0x26) is represented by the two-octet sequence "&-".

All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all Unicode 16-bit octets) are represented in modified BASE64, with a further modification from [UTF-7] that "," is used instead of "/".
Modified BASE64 MUST NOT be used to represent any printing US-ASCII character which can represent itself.

"&" is used to shift to modified BASE64 and "-" to shift back to US-ASCII. All names start in US-ASCII, and MUST end in US-ASCII (that is, a name that ends with a Unicode 16-bit octet MUST end with a "-").

Before I'll start implementing it, my question: is there some .NET code/library out there (or even in the framework) that does the job? I couldn't find .NET resources (only implementations for other languages/frameworks).

Thank you!

+1  A: 

This is too specialized to be present in a framework. There might be something on codeplex though many incomplete "implementations" I've seen don't do bother with the conversion at all and will happily pass all non-us-ascii characters on to the IMAP server.

However I've implemented this in the past and it is really just 30 lines of code. You go through all characters in a string and output them if they fall in the range between 0x20 and 0x7e (don't forget to append "-" after the "&") otherwise collect all non-us-ascii and convert them using UTF7 (or UTF8 + base64, I'm not quite sure here) replacing "/" with ",". Additionally you need to maintain "shifted state", e.g. whether you're currently encoding non-us-ascii or outputting us-ascii and append transition tokens "&" and "-" on state change.

liggett78