views:

210

answers:

1

Well basically I have a scraping application. It scrapes around n items per minute. currently i have only one IP.

The site i'm scraping allows me 3 connections per IP.

I'm thinking about getting another IP.

so i'll be able to get 6 connections.

in theory i should be able to get n items in 40 seconds, more or less.

currently i'm using java (commons-httpcore) to get the job done.

I'm not sure if this is java question or an OS question.

my machine has IP 1 and IP 2 how do i connect to, say, www.microsoft.com, using IP 1 and using IP2? how can i specify, which ip i want to use to do a connection?

+2  A: 
// requires HttpComponents Client 4.*
DefaultHttpClient httpclient = new DefaultHttpClient();
httpclient.getParams().setParameter(
  ConnRoutePNames.LOCAL_ADDRESS, 
  InetAddress.getByName("10.10.10.10")
);

See: http://hc.apache.org/httpcomponents-client/httpclient/apidocs/index.html

David Portabella
This would only work if the server is naive enough to throttle-by-IP based on the IP placed in the http headers by the client. In other words: unlikely.
matt b
Most well written applications have some netscalars/or some router config where they "physically" actually look up your IP address (and not what you set on the headers field). I work in the gambling industry and it is a legal requirement to do this (to block bets from some countries)
Calm Storm
@david I think you've mixed some httpclient 3.x with 4.x there @matt @calm using `ConnRoutePNames.LOCAL_ADDRESS` doesn't set any header but actually allows to set the IP address to use (internally using `Socket.bind(SocketAddress)`).
sfussenegger