I understand that the default encoding of an HTTP Request is ISO 8859-1.
Am I able to use Unicode to decode an HTTP request given as a byte array?
If not, how would I decode such a request in C#?
EDIT: I'm developing a server, not a client.
I understand that the default encoding of an HTTP Request is ISO 8859-1.
Am I able to use Unicode to decode an HTTP request given as a byte array?
If not, how would I decode such a request in C#?
EDIT: I'm developing a server, not a client.
Hi,
The code given below should help, if you are expecting large amount of data streaming down then doing it asynchronously is the best way to go about.
string myUrl = @"http://somedomain.com/file";
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(myUrl);
//Set some reasonable limits on resources used by this request
request.MaximumAutomaticRedirections = 4;
request.MaximumResponseHeadersLength = 4;
request.Timeout = 15000;
response = (HttpWebResponse)request.GetResponse();
Stream receiveStream = response.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
StreamReader readStream = new StreamReader(receiveStream, encode);
Char[] read = new Char[512];
// Reads 512 characters at a time.
int count = readStream.Read(read, 0, 512);
while (count > 0)
{
// Dumps the 512 characters on a string and displays the string.
String str = new String(read, 0, count);
count = readStream.Read(read, 0, 512);
}
As you said the default encoding of an HTTP POST request is ISO-8859-1. Otherwise you have to look at the Content-Type header that might then look like Content-Type: application/x-www-form-urlencoded; charset=UTF-8
.
Once you have read the posted data into a byte array you may decide to convert this buffer to a string (remember all strings in .NET are UTF-16). It is only at that moment that you need to know the encoding.
byte[] buffer = ReadFromRequestStream(...)
string data = Encoding
.GetEncoding("DETECTED ENCODING OR ISO-8859-1")
.GetString(buffer);
And to answer your question:
Am I able to use Unicode to decode an HTTP request given as a byte array?
Yes, if unicode has been used to encode this byte array:
string data = Encoding.UTF8.GetString(buffer);
You don't use a unicode encoding to decode something that is not encoded using a unicode encoding, as that would not correctly decode all characters.
Create an Encoding
object for the correct encoding and use that:
Encoding iso = Encoding.GetEncoding("iso-8859-1");
string request = iso.GetString(requestArray);
Every time .NET
transfers information between an external representation (e.g. a TCP socket) and the internal Unicode
format (or the other way around), some form of encoding is involved.
See utf-8-vs-unicode, especially Jon Skeet's answer, with the reference to Joel's article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).