views:

227

answers:

1

I am looking for a good way to do a URI Escape in C++ that would be reasonable for a cross platform project.

I would like a function that would take a string like this:

L"jiayou加油"

And return:

L"jiayou%E5%8A%A0%E6%B2%B9"

I looked at using some thing like this, with minor modifacations to use wchar_t. However that would require converting from utf-16 to utf-8 before the printf call. This has lead me down character encoding hell.

This and all the other approaches I have looked into just feel like the wrong way. Is there a good way to URI Escape a wstring in C++?

+3  A: 

No matter what you do you're in some sort of character encoding hell (that's just the way it is with character encodings).

From http://labs.apache.org/webarch/uri/rfc/rfc3986.html#characters:

The URI syntax provides a method of encoding data, presumably for the sake of identifying a resource, as a sequence of characters. The URI characters are, in turn, frequently encoded as octets for transport or presentation. This specification does not mandate any particular character encoding for mapping between URI characters and the octets used to store or transmit those characters. When a URI appears in a protocol element, the character encoding is defined by that protocol; without such a definition, a URI is assumed to be in the same character encoding as the surrounding text.

So, at some point you need to convert your URI to to the encoding that's appropriate to whatever you're sending the URI to. If that's UTF8 then you might as well do that conversion before you perform percent-encoding so you can use the library routine you've already found. If it's not UTF8 then you need to know what the recipient of the URI is expecting (again, that's the way it is with charset encodings - you have to know what the other guy is expecting, or be able to tell him) so you can percent-encode the characters in the character set encoding it's expecting.

Michael Burr