



I understand that the default encoding of an HTTP Request is ISO 8859-1.

Am I able to use Unicode to decode an HTTP request given as a byte array?

If not, how would I decode such a request in C#?

EDIT: I'm developing a server, not a client.


The code given below should help, if you are expecting large amount of data streaming down then doing it asynchronously is the best way to go about.

string myUrl = @"";
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(myUrl);

//Set some reasonable limits on resources used by this request
request.MaximumAutomaticRedirections = 4;
request.MaximumResponseHeadersLength = 4;
request.Timeout = 15000;

response = (HttpWebResponse)request.GetResponse();                              

Stream receiveStream = response.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");

StreamReader readStream = new StreamReader(receiveStream, encode);

Char[] read = new Char[512];

// Reads 512 characters at a time.
int count = readStream.Read(read, 0, 512);

while (count > 0)
  // Dumps the 512 characters on a string and displays the string.
  String str = new String(read, 0, count);
  count = readStream.Read(read, 0, 512);
Sorry, that looks like code to request from a web resource and decode the response. I guess I should clarify my question.You're doing something with UTF-8 there... Can I just use that to decode an HTTP request?
Charlie Somerville
+2  A: 

As you said the default encoding of an HTTP POST request is ISO-8859-1. Otherwise you have to look at the Content-Type header that might then look like Content-Type: application/x-www-form-urlencoded; charset=UTF-8.

Once you have read the posted data into a byte array you may decide to convert this buffer to a string (remember all strings in .NET are UTF-16). It is only at that moment that you need to know the encoding.

byte[] buffer = ReadFromRequestStream(...)
string data = Encoding
              .GetEncoding("DETECTED ENCODING OR ISO-8859-1")

And to answer your question:

Am I able to use Unicode to decode an HTTP request given as a byte array?

Yes, if unicode has been used to encode this byte array:

string data = Encoding.UTF8.GetString(buffer);
Darin Dimitrov
+1  A: 

You don't use a unicode encoding to decode something that is not encoded using a unicode encoding, as that would not correctly decode all characters.

Create an Encoding object for the correct encoding and use that:

Encoding iso = Encoding.GetEncoding("iso-8859-1");
string request = iso.GetString(requestArray);

Every time .NET transfers information between an external representation (e.g. a TCP socket) and the internal Unicode format (or the other way around), some form of encoding is involved.

See utf-8-vs-unicode, especially Jon Skeet's answer, with the reference to Joel's article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).

Joel's article was the very reason I decided to think about Encoding rather than just blindly using ASCII :p
Charlie Somerville
Remember, UTF-8 or even UTF-16 is not Unicode.