views:

210

answers:

3

Hi Guys,

I've written a Python application that makes web requests using the urllib2 library after which it scrapes the data. I could deploy this as a web application which means all urllib2 requests go through my web-server. This leads to the danger of the server's IP being banned due to the high number of web requests for many users. The other option is to create an desktop application which I don't want to do. Is there any way I could deploy my application so that I can get my web-requests through the client side. One way was to use Jython to create an applet but I've read that Java applets can only make web-requests to the server it is deployed on and the only way to to circumvent this is to create a server side proxy which leads us back to the problem of the server's ip getting banned.

This might sounds sound like and impossible situation and I'll probably end up creating a desktop application but I thought I'd ask if anyone knew of an alternate solution.

Thanks.

+1  A: 

You probably can use AJAX requests made from JavaScript that is a part of client-side.

  • Use server → client communication to give commands and necessary data to make a request
  • …and use AJAX communication from client to 3rd party server then.
nailxx
unfortunately, AJAX means dropping python for javascript, but that's the only sensible way of offloading the traffic to the client.
Adrien Plisson
Actually I have an experience of running python on client side using Silverlight + DLR + IronPython. But that means client should have Silverlight installed. So it is better to stuck to JavaScript. After all JS is not so bad language once you catch the idea.
nailxx
I've tried looking this up online but most them say that JS doesn't support cross-site requests. This too has been circumvented by using a server-side proxy. Isn't there a way to do it without a proxy?
Mridang Agarwalla
@mridang. Well, I'm afraid there is no way to do such calls without proxy if an author of the service you want to make request to doesn't allow it by design, i.e. he doesn't provide something like JSONP to make cross-domain calls possible.
nailxx
A: 

This depends on the form of "scraping" you intend to do:

Check out diggstripper on google code.

Hans
+1  A: 

You can use a signed Java applet, they can use the Java security mechanism to enable access to any site. This tutorial explains exactly what you have to do: http://www-personal.umich.edu/~lsiden/tutorials/signed-applet/signed-applet.html

The same might be possible from a Flash applet. Javascript is also restricted to the published site and doesn't allow being signed or security exceptions like this, AFAIK.

wump
Hi wump. this seems to work. Cheers.
Mridang Agarwalla