views:

393

answers:

2

I want to launch default e-mail client application via ShellExecute function.

I.e. I write something like this:

ShellExecute(0, 'mailto:[email protected]?subject=example&body=example', ...);

How can I encode non-US characters in subject and body?

I can't use default ANSI code page, because characters can be anything: chinese characters, cyrillic or something else.

P.S. Notes:

  1. I'm using ShellExecuteW function.
  2. Leaving subject and body "as is" will not work (tested with Windows Live Mail client on Win7 and Outlook Express on WinXP).
  3. Encoding subject as URLEncode(UTF8Encode(Subject)) will work for Windows Live Mail, but won't work for Outlook Express.
  4. URLEncode(UTF8Encode(Body)) will not work for both clients.
+1  A: 

The interpretation of the command line is up to the launched program. Depending on the nature of the installed e-mail client, you may or may not get your Unicode support (in one or another different shape or form). So there's no single recipe. Some of them may use ANSI command line (because why not?), some of them may respect URLEncoded characters, etc.

Your best bet is to detect 3-4 popular mailers by reading the registry and customize your command line accordingly. Very inelegant, and incomplete by design, but nothing else you can do.

Seva Alekseyev
Well, I'd rather use MAPI or something else then :D
Alexander
+3  A: 

mailto:[email protected]?subject=example&body=%e5%85%ad

The short answer is no. Characters must be percentage-encoded as defined by RFC 3986 and its predecessors. RFC 2368 defines the structure of the mailto URI.

#include "windows.h"

int main() {
  ShellExecute(0, TEXT("open"),
    TEXT("mailto:[email protected]?subject=example&body=%e5%85%ad"),
    TEXT(""), NULL, SW_SHOWNORMAL);

  return 0;
}

The body in this case is the CJK character U+516D (六) encoded as UTF-8 (E5 85 AD). This works correctly on with Mozilla Thunderbird (you may need to install additional fonts if it does not).

The rest is up to how your user-agent (mail client) interprets the URI. RFC 3986 mandates UTF-8, but prior specifications did not. A user-agent may fail to interpret the data correctly if it pre-dates RFC 3986, has not been updated or is maintaining backwards compatibility prior implementations.

Note: URLEncode functions generally mean the HTML application/x-www-form-urlencoded encoding. This will probably cause space characters to be replaced by plus characters.

Note 2: I'm not current on the state of IRI support in the Windows shell, but it's probably worth looking into. However, some characters in the query part will still need to be percent-encoded.

McDowell
Okay, so the short answer is: "use %-encoding on UTF-8 form" (so I was doing it almost right). If you got garbage - throw out your e-mail client :D
Alexander