views:

25

answers:

2

I have a crawling program that fetches urls to parse the html and came across an unusual error since I started this. For a specific set of urls from a site when fetching using HTTPWebRequest and HTTPWebResponse I get the error

**> The remote server returned an error:

(404) Not Found**

This is unusual since it works when pasting it in my browser. Any ideas appreciated. Not sure if code is needed to posted but let me know if so.

+1  A: 

The site could be blocking your user-agent, or it could require cookies.

David
I tried changing useragents and this did not work. How do I enable cookies from within the program?
vbNewbie
ok gonna try the cookiecontainer class thingy
vbNewbie
+1  A: 

Could it be that the remote server is serving different pages depending on the User-Agent, and that it doesn't have a page that corresponds to the User-Agent value provided by the HttpWebRequest instance (empty by default)? Just a thought, since you say that the page can be found when navigating to its address with the browser but not through code.

Anders Fjeldstad
Thank you for the response...was not sure what you meant exactly but I did try switching the useragents and did not work. It always gets the first url and then I get the error thereafter. Is the site blocking me; which is funny since I checked the robot.txt
vbNewbie