views:

604

answers:

4

I need to crawl a web board, which uses ajax for dynamic update/hide/show of comments without reloading the corresponding post. I am blocked by this comment area.

In Ajax.request, url is specified with a path without host name like this :

new Ajax(**'/bbs/comment_db/load.php'**, {
    update       : $('comment_result'), 
    evalScripts  : true, 
    method       : 'post', 
    data         : 'id=work_gallery&no=i7dg&sno='+npage+'&spl='+splno+'&mno='+cmx+'&ksearch='+$('ksearch').value,
    onComplete   : function() {
        $('cmt_spinner').setStyle('display','none');  
        try { 
            $('cpn'+npage).setStyle('fontWeight','bold'); 
            $('cpf'+npage).setStyle('fontWeight','bold');
        } catch(err) {} 
    }
}).request();

If I try to access the url with the full host name then I just got the message: "Permission Error" :

new Ajax(**'http://host.name.com/bbs/comment_db/load.php'**, {
    update      : $('comment_result'), 
    evalScripts : true, 
    method      : 'post', 
    data        : 'id=work_gallery&no=i7dg&sno='+npage+'&spl='+splno+'&mno='+cmx+'&ksearch='+$('ksearch').value,
    onComplete  : function() {
        $('cmt_spinner').setStyle('display','none');  
        try { 
            $('cpn'+npage).setStyle('fontWeight','bold'); 
            $('cpf'+npage).setStyle('fontWeight','bold');
        } catch(err) {} 
    }
}).request();

will result in the same error.

This is the same even when I call the actual php url in the web browser like this: http://host.name.com/bbs/comment_db/load.php?'id=work_gallery&..'

I guess that the php module is restricted to be called by an url in the same host.

Any idea for crawling this data ?

Thanks in advance.

-- Shin

A: 
method:'post'

might well be your problem: the host serving the request likely rejects get requests, which is all you can throw at it from a browser address bar. if this is what's happening, you'll need to find or install some sort of scripting tool capable of doing the job (perl would be my choice, and unless you're running Windows, you'll already have that).

I do have to wonder whether what you're trying to do is legit, though: trawling other sites' comment databases isn't usually encouraged.

Pete Jordan
I tried the post method in a form tag, but I got the same result.1 year ago, this site didn't use ajax, and I got no problem in there.The site is an online photo sharing community, and there is no restriction warning about comment. But it has warning for un-notified copy of photos, though.
We use the data for research purpose.
+1  A: 

Cross site XMLHttpRequest are forbidden by most browsers. If you want to crawl different sites, you will need to do it in a server side script.

Darin Dimitrov
I would appreciate a lot it you explain a little more about the server side script.By the way, direct call to php code without ajax request also results in the same consequence: Permission error.
+1  A: 

As mentioned by darin, the XMLHttpRequest Object (which is the essence of Ajax requests) has security restrictions on calling cross-site HTTP requests, I believe its called the "Same Origin Policy for JavaScript".

While there is a working group within the W3C who have proposed new Access Control for Cross-Site Requests recommendation the restriction still remains in effect for most mainstream browsers.

I found some information on the Mozilla Developer Network that may provide a better explanation.

In your case, it appears that you are using the Prototype JavaScript framework, where Ajax.Request still uses the XMLHttpRequest object for its Ajax requests.

Asciant
A: 

I would solve this by running a PHP script locally that will do the crawling from outside pages. That way jQuery doesn't have to go to an outside domain.

patrick