ansaurus

Question

Use HttpWebRequest to download web pages without key sensitive issues

Answer 1

+1 A:

[update: I don't know why, but both examples below now work fine! Originally I was also seeing a 403 on the page2 example. Maybe it was a server issue?]

First, WebClient is easier. Actually, I've seen this before. It turned out to be case sensitivity in the url when accessing wikipedia; try ensuring that you have used the same case in your request to wikipedia.

[updated] As Bruno Conde and gimel observe, using %27 should help make it consistent (the intermittent behaviour suggest that maybe some wikipedia servers are configured differently to others)

I've just checked, and in this case the case issue doesn't seem to be the problem... however, if it worked (it does~~n't~~), this would be the easiest way to request the page:

        using (WebClient wc = new WebClient())
        {
            string page1 = wc.DownloadString("http://en.wikipedia.org/wiki/Algeria");

            string page2 = wc.DownloadString("http://en.wikipedia.org/wiki/%27Abadilah");
        }

~~I'm afraid I can't think what to do about the leading apostrophe that is breaking things...~~

Marc Gravell 2008-11-09 13:20:39

I just tried the above code and it worked fine for both page1 and page2, what error were you receiving?

duckworth 2008-11-09 13:38:30

@duckworth - OK, that is odd. When I posted, I was getting 403 on page2, but now it works! Maybe it was a server issue in the first place!

Marc Gravell 2008-11-09 13:47:00

There are a pattern that when I make the request in C# it fail but if try to open it first using the browser and then make the C# request it's sometimes work. But I don't know where is the problem.It's weird...

Ifx64 2008-11-13 10:42:54

@Haytham El-Fadeel: maybe it works if it can get it from the cache, but doesn't work for vanilla requests?

Marc Gravell 2008-11-13 12:57:22

Answer 2

+1 A:

I also got strange results ... First, the

http://en.wikipedia.org/wiki/'Abadilah

didn't work and after some failed tries it started working.

The second url,

http://en.wikipedia.org/wiki/'t_Zand_(Alphen-Chaam)

always failed for me...

The apostrophe seems to be the responsible for these problems. If you replace it with

%27

all urls work fine.

bruno conde 2008-11-09 13:56:45

I try to make it %27 using HttpUtility.UrlPathEncode but it didn't work

Ifx64 2008-11-13 10:41:06

Answer 3

+1 A:

Try escaping the special characters using Percent Encoding (paragraph 2.1). For example, a single quote is represented by %27 in the URL (IRI).

gimel 2008-11-09 13:59:07

Answer 4

+1 A:

I'm sure the OP has this sorted by now but I've just run across the same kind of problem - intermittent 403's when downloading from wikipedia via a web client. Setting a user agent header sorts it out:

client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");

Martynnw 2009-12-26 20:34:57

ua fixed this for me - a anti-spamming "feature" of wikipedia?

Ben Aston 2010-03-12 03:00:15

ansaurus

tags:

views:

answers:

Use HttpWebRequest to download web pages without key sensitive issues

related questions