tags:

views:

145

answers:

3

I am accepting a POST request like so:

Socket connection = m_connection;
Byte[] receive = new Byte[1024];

int received = connection.Receive(receive);
Console.WriteLine(received.ToString());

string request = Encoding.ASCII.GetString(receive);
Console.WriteLine(request);

The post values end up being weird, if I post text values a lot of times they end up with a lot of +'s behind them. If I post C:\Users\John Doe\wwwroot, it ends up being: C%3A%5CUsers%5John+Doe%5Cwwwroot

index.html becomes index.html++++++++++++++++++++++++++++++++

It seems I am getting the Encoding wrong somehow, however I tried multiple encodings, and they have same weirdness. What is the best way to correctly read a HTTP POST request from a socket byte stream?

+2  A: 

You need to trim the byte array receive that you are passing to the GetString method. Right now, you are passing all 1024 bytes, so the GetString method is trying to encode those as best it can.

You need to use the received variable to indicate the bounds for the string you are encoding.

casperOne
+1  A: 

First of, you don't need to decode the input, HTTP is ASCII and it be faster to work with just bytes. Now, what you'll want to do is that you'll define a maximum HTTP request header size, say 4K? and then you'll keep reading bytes until you hit \r\n\r\n this signals the end of the HTTP request. You'll need to enforce this maximum header size limit otherwise a single malicious users could send a infinite HTTP request and your server would run out of memory.

You should read the HTTP specification.

Depending on your HTTP request the HTTP content can be many things and you need to act accordingly. The HTTP protocol itself is always ASCII so you can treat it as just bytes but the content can be encoded very differently. This is generally explained by the Content-Type: header. But again, read the HTTP specification.

John Leidegren
+1  A: 

You should use System.Web.HttpUtility.UrlDecode not Encoding.ASCII to peform the decoding.

You will probably get away with passing Encoding.Default as the second parameter to this static method.

Your are seeing the result of a HTML form POST which encodes the values as if they were being appended to a URL as a search string. Hence it is a & delimited set of name=value pairs. Any out-of-band characters are encoded to their hex value %xx.

The UrlDecode method will decode all this for you.

As other have stated you really need to chunk the stream in, it may be bigger that 1K.

Strictly speaking you should check the Content-Type header for any ;CharSet= attribute. If present you need to ensure the character encode you pass to UrlDecode is appropriate to that CharSet (e.g., if CharSet=UTF-8 then use Encoding.UTF8).

AnthonyWJones