



I have browsed through many posts on this and have tried some of the suggestions but still not understanding it fully. I would like to scrape html pages that have some script running that usually executes the script to display a link after clicking. Some mentioned firebug and others talked about reverse engineering the code I need. But after trying reverse engineering I still dont see how to get the data after tracing the script function.

        function() {
            var categoryList = jQuery('#category-list');
            categoryList.css('top', jQuery(this).offset().top+43);
            jQuery('.category-selector img').attr        ('src', '/images/up_arrow.png');
        function() {
            var categoryList = jQuery('#category-list');
            jQuery('.category-selector img').attr('src', '/images/down_arrow.png');

    jQuery('.category-item a').click(

            idToShow = jQuery(this).attr('id').substr(9);
            hideAllExcept(jQuery('#category_' + idToShow));
            jQuery('.category-item a').removeClass('activeLink');

I am using and some sites were easy using firebug where looking at the script I was able to pull the data that I needed. What woudl I do in this scenario? the link is and the categories are what I am trying to access. Notice the url does not change. Appreciate any responses.

+1  A: 

My best suggestion would be to use Selenium for screen scraping. It is normally used for automated website testing but would fit your case well. I've used to screen scrape AJAX pages on multiple occasions where the page was heavily Javascript dependent.

You can write your screen scraping code to run in .NET and it can use Firefox or IE to run your screen scraping with.

With selenium what you'll do is record a screen scraping session with the Selenium IDE in Firefox (look for the Firefox extension in the link above). That screen scraping session can either output an HTML template or C# code. It might be able to output VB as well.

You'll copy the C# or VB.NET output from the screen scrape into a selenium .NET project that you'll create and then run the Selenium project through Nunit.

I'd suggest looking online for some help with getting Selenium started and working but this should get you on your way.

Paul Mendoza
Thak you for the reponse. Will definitely look into Selenium.Just wanted to ask if you think there may be another approach that can be used within visual studio since my crawler will access a bunch of urls at once and if it contains these types of scripts, will need to implement a different function for that site but within the same program.I do appreciate your reponse.