WebClient.DownloadString() Not Producing Exact HTML

views:

155

answers:

WebClient.DownloadString() Not Producing Exact HTML

So here's the deal. I'm creating a spider bot for a website that scans all the product pages and records the product data. I'm using C# and the WebClient library to download the HTML string. The site I'm crawling must be specially made because the HTML that is received from WebClient.DownloadString() is different than the HTML that I get when I view the source of the HTML when visiting it on a browser. This seems intentional because the only info I can't get is the price.

Does anyone know a workaround for this problem or can anyone explain what is happening? Thanks.

+1 A:

It is probably using the the user agent string to decide what content to send. The example here shows how to set the user agent header.

Ben Robinson 2010-05-20 20:06:41

OP here, I was able to find out exactly why. Apparently the website uses an AJAX function to get sensitive data. When I do a screen scrape with WebClient.DownloadString(), I get the HTML document, but instead of getting this sensitive info, I get a segment of AJAX where this should be. Does this help? I will post the code for the AJAX that is included in the HTML

Ryan Fuentes 2010-05-20 20:26:36

<div id="product_details" style="position:relative"></div> <script language="javascript"> var rQsp = 'productId=3519 showProductDetails(rQsp); </script>

Ryan Fuentes 2010-05-20 20:27:00

The ajax call is being made by the function showProductDetails,you need to look at the source code of that function to find out how to scrape the data.

Ben Robinson 2010-05-20 20:38:23

That function is not present in the html document

Ryan Fuentes 2010-05-20 20:42:52

IT must be within a references javascript file. Look for <script> tags and pulldown the .js file referenced by the src attribute. It will be in one of them.

Ben Robinson 2010-05-20 21:07:02

ansaurus

tags:

views:

answers:

WebClient.DownloadString() Not Producing Exact HTML

related questions