views:

47

answers:

2

Hi,

if I have utf-8 encoded data, is it safe to send them in a HTTP body? The thing is that utf-8 data could include control characters including the null character (binary zero), which are not allowed by http RFC of course. So what to do with such data? Encode them with base64?

On the other side the data, which I have in utf-8 is XML and XML specification forbids use of special characters (http://www.w3.org/TR/2006/REC-xml-20060816/#charsets)...

So I guess that the utf-8 is not safe, but XML in utf-8 is safe and can be directly embedded in the http body, e.g. in the MIME multipart body without need to do something like quoted-printable encoding.

BR STeN

+2  A: 

HTTP allows the sending of ARBITRARY data. So yes; UTF-8 is safe for HTTP, but on the gripping hand; 0x00 isn't really "safe" anywhere. Both HTTP request and response bodies have methods for dealing with arbitrary data, as does MIME (which usually encapsulates HTTP POST bodies), namely a Length:-header.

There is no control character that can cause a compliant HTTP implementation to assume that the body is done if it hasn't reached Length:.

Williham Totland
utf-8 doesn't have 0x00
Andrey
Hi Williham, Thanks for the answer - I do not know why I thought that the HTTP body does not allow special characters... I am idiot. This makes my question irrelevant. Thanks a lot for the response!
STeN
Hi Andrey, utf-8 actual allows all special ASCII characters... Check this RFC 3629 it says "...US-ASCII characters are encoded in one octet having the normal US-ASCII value ..." This makes the utf-8 backward compatible... BR
STeN
+2  A: 

HTTP message bodies can contain arbitrary data (as Williham pointed out).

Furthermore, there is quoted-printable encoding in HTTP, nor do you need a multipart body.

How do you think images on the Web work? :-)

Julian Reschke
Hi, the example with images is more then clear. The reason why I did not think about binaries in body is that I was working 99% in SIP world, where base64 is used almost everywhere.Thanks for your time. BR
STeN