tags:

views:

273

answers:

3

I'm running the following code;

using (WebClient wc = new WebClient())
{
    string page = wc.DownloadString(URL);
    ...
}

To access the URL of a share price website, http://www.shareprice.co.uk

If you append a company's symbol name onto the end of the URL, then a page is returned which I parse to get the latest price info etc.

e.g.

http://www.shareprice.co.uk/VOD

http://www.shareprice.co.uk/TW.

Now, my problem is that some symbols end in periods, as in the second example there. For some unknown reason, the code above has a problem retrieving these sorts of URLs.

There is no run-time error, but a page is returned back which reports "Symbol could not be found" from the website itself, indicating that something is happening to the period on the end of the URL in between the call to DownloadString and the actual HTTP request.

Does anyone have any idea what might be causing this, and how to fix it?

Thanks

+1  A: 

Try adding a slash to the end, after the period. Your normal web browser will do that for you, and the WebClient class isn't that smart.

http://www.shareprice.co.uk/TW./

This worked for me as well when I typed it into the browser.

Edit - added

The following all also worked in the browser

http://www.shareprice.co.uk/TW

and

http://www.shareprice.co.uk/TW/

so it looks like you should be able to just check to see if the last character is a period, and remove it.

David Stratton
However it does not seem to work in WebClient or WebRequest. Both of these classes convert strings to Uri. When a Uri is handed that TW. url, it seems to remove the period. Presumably it believes you intended to complete the file extension, however it wasn't completed, so it just cuts it off.
Sean
A: 

use URL encoding...it will turn the "." into %2E

Rich
At first glance, one would think this would work, but it does not seem to.
Sean
Are you using Fiddler to trace the calls? Might be a good idea. When you can see how the raw request looks, it sometimes offers more clues than just plain old debugging does.
Rich
+2  A: 

It seems you found a bug in WebClient/WebRequest, though perhaps Microsoft put that in intentionally, who knows. Nonetheless, when you pass in TW., the URI class is translating that to TW without the period. Since WebClient/WebRequest parse strings into URI, your . is disappearing in that world.

You may have to use TcpClient to get around this and roll your own web client. Any variation of this:

TcpClient oClient = new TcpClient("www.shareprice.co.uk", 80);

NetworkStream ns = oClient.GetStream();

StreamWriter sw = new StreamWriter(ns);
sw.Write(
   string.Format( 
      "GET /{0} HTTP/1.1\r\nUser-Agent: {1}\r\nHost: www.shareprice.co.uk\r\n\r\n",
           "TW.", 
           "MyTCPClient"  )
);                    
sw.Flush();

StringBuilder sb = new StringBuilder();

while (true)
{
    int i = ns.ReadByte(); // Inefficient but more reliable 
    if (i == -1) break;  // Other side has closed socket 
    sb.Append( (char) i );   // Accrue 'c' to save page data 
}

oClient.Close();

This will give you a 302 redirect, so just parse out the 'Location:' and execute the above again with the new location.

HTTP/1.1 302 Found
Date: Wed, 11 Nov 2009 19:29:27 GMT
Server: lighttpd
X-Powered-By: PHP/5.2.4-2ubuntu5.7
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: /TW./TAYLOR-WIMPEY-PLC
Content-type: text/html; charset=UTF-8
Content-Length: 0
Set-Cookie: SSID=668d5d0023e9885e1ef3762ef5e44033; path=/
Vary: Accept-Encoding
Connection: close
Sean
Perfect. This solution worked greated, thanks a lot Sean.
C.McAtackney
No problem, glad it could help.
Sean