views:

1356

answers:

5

Hello,

I want to be able to manipulate the html of a given url. Something like html scraping. I know this can be done using curl or some scraping library.But i would like to know if it is possible to use jquery to make a get request to the url using ajax and retrieve the html of the url, and run jquery code on the html returned ?

Thank You

A: 

http://www.nathanm.com/ajax-bypassing-xmlhttprequest-cross-domain-restriction/

The only problem is that due to security in both Internet Explorer and in FireFox, the XMLHTTPRequest object is not allowed to make cross-domain, cross-protocol, or cross-port requests.

knoopx
+1  A: 

You cannot do Ajax request to another domain-name than the one your website is on, because of the Same Origin Policy ; which means you will not be quite able to do what you want... At least directly.

A solution would be to :

  • have some kind of "proxy" on your own server,
  • send your Ajax request to that proxy,
  • which, in turn, will fetch the page on the other domain name ; and return it to your JS code as response to the Ajax request.

This can be done in a couple of lines with almost any language (like PHP, using curl, for instance)... Or you might be able to use some functionnality of your webserver (see mod_proxy and mod_proxy_http, for instance, for Apache)

Pascal MARTIN
A: 

Instead of curl, you could use a tool like Selenium which will automate loading the page in the browser. You can run JavaScript with it.

Annie
+3  A: 

I would like to point out that there are situations where it is perfectly acceptable to use jQuery to scrape screens across domains. Windows Sidebar gadgets run in a 'Local Machine Zone' that allows cross domain scripting.

And jQuery does have the ability to apply selectors to retreived html content. You just need to add the selector to a load() method's url parameter after a space.

The example gadget code below checks this page every hour and reports the total number of page views.

<html>
<head>
    <script type="text/javascript" src="jquery.min.js"></script>
    <style>
        body { 
            height: 120px;
            width: 130px;
            background-color: white;
        };
    </style>
</head>

<body>
Question Viewed:
<div id="data"></div>

<script type="text/javascript">

    var url = "http://stackoverflow.com/questions/1936495/website-scraping-using-jquery-and-ajax"

    updateGadget();

    inervalID = setInterval("updateGadget();", 60 * 1000);

    function updateGadget(){

        $(document).ready(function(){
            $("#data").load(url + " .label-value:contains('times')");
        });

    }

</script>

</body>
</html>
Alex
A: 

Doesn't Greasemonkey also enable this?

Hamster