If you let the default behavior (loHttp.AllowAutoRedirect = true
) and your code doesn't work (you don't get redirected to the new resource) it means that the server is not encoding the Location
header correctly. Is the redirect working in the browser?
For example if the redirect url is http://site/Μία_Σελίδα
the Location header must look like http://site/%CE%95%CE%BD%CE%B9%CE%B1%CE%AF%CE%BF_%CE%94%CE%B5%CE%
.
UPDATE:
After further investigating the issue I begin to suspect that there's something strange with HttpWebRequest
. When the request is sent the server sends the following response:
HTTP/1.1 301 Moved Permanently
Date: Fri, 11 Dec 2009 17:01:04 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Location: http://www.site.com/buy/κινητή-σταθερή-τηλεφωνία/c/cn69569/
Content-Length: 112
Content-Type: text/html; Charset=UTF-8
Cache-control: private
Connection: close
Set-Cookie: BIGipServerpool_webserver_gr=1007732746.36895.0000; path=/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
As we can see the Location
header contains greek characters which are not url encoded. I am not quite sure if this is valid according to the HTTP specification. What we can say for sure is that a web browser interprets it correctly.
Here comes the interesting part. It seems that HttpWebRequest
doesn't use UTF-8 encoding to parse the response headers because when analyzing the Location
header it gives: http://www.site.com/buy/κινηÏή-ÏÏαθεÏή-ÏηλεÏÏνία/c/cn69569/
, which of course is wrong and when it tries to redirect to this location the server responds with a new redirect and so on until the maximum number of redirects is reached and an exception is thrown.
I couldn't find any way to specify the encoding used by HttpWebRequest
when parsing the response headers. If we use TcpCLient manually it works perfectly fine:
using (var client = new TcpClient())
{
client.Connect("www.site.com", 80);
using (var stream = client.GetStream())
{
var writer = new StreamWriter(stream);
writer.WriteLine("GET /default/defaultcatg.asp?catg=69569 HTTP/1.1");
writer.WriteLine("Host: www.site.com");
writer.WriteLine("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090805 Shiretoko/3.5.2");
writer.WriteLine("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
writer.WriteLine("Accept-Language: en-us,en;q=0.5");
writer.WriteLine("Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7");
writer.WriteLine("Connection: close");
writer.WriteLine(string.Empty);
writer.WriteLine(string.Empty);
writer.WriteLine(string.Empty);
writer.Flush();
var reader = new StreamReader(stream);
var response = reader.ReadToEnd();
// When looking at the response it correctly reads
// Location: http://www.site.com/buy/κινητή-σταθερή-τηλεφωνία/c/cn69569/
}
}
So I am really puzzled by this behavior. Is there any way to specify the correct encoding used by HttpWebRequest
? Maybe some request header should be set?
As a workaround you could try modifying the asp
page that performs the redirect and urlencode the Location
header. For example when in an ASP.NET application you perform a Response.Redirect(location)
, the location will be automatically html encoded and any non standard characters will be converted to their corresponding entities.
For example if you do: Response.Redirect("http://www.site.com/buy/κινητή-σταθερή-τηλεφωνία/c/cn69569/");
in an ASP.NET application the Location
header will be set to :
http://www.site.com/buy/%ce%ba%ce%b9%ce%bd%ce%b7%cf%84%ce%ae-%cf%83%cf%84%ce%b1%ce%b8%ce%b5%cf%81%ce%ae-%cf%84%ce%b7%ce%bb%ce%b5%cf%86%cf%89%ce%bd%ce%af%ce%b1/c/cn69569
It seems that this is not the case with classic ASP.