hi all, I'm just getting started writing a simple web crawler to get info on links we have coming in to our system. I'm using httpclient 4.x. I have about 100 threads running fetching links and doing head requests on them, it works great for the first few hours then it slows to a screeching crawl. I'm not sure if I'm setting up the connection manager properly or not.
here is the code I have to create an httpclient object. Anyone see anything that would raise an alarm with this code block? When I stop the server and restart it everything is as good as new again. During the phase when it's running slow the memory still looks ok at a steady 500K per process so it doesn't look like I'm leaking memory.
HttpParams httpParams = new BasicHttpParams();
HttpConnectionParams.setConnectionTimeout(httpParams, 5000);
HttpConnectionParams.setSoTimeout(httpParams, 5000);
ConnManagerParams.setMaxTotalConnections(httpParams, 200);
HttpProtocolParams.setVersion(httpParams, HttpVersion.HTTP_1_1);
// set request params
httpParams.setParameter("http.protocol.cookie-policy", CookiePolicy.BROWSER_COMPATIBILITY);
httpParams.setParameter("http.useragent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)");
SchemeRegistry schemeRegistry = new SchemeRegistry();
schemeRegistry.register(new Scheme("http", PlainSocketFactory.getSocketFactory(), 80));
schemeRegistry.register(new Scheme("https", PlainSocketFactory.getSocketFactory(), 443));
final ClientConnectionManager cm = new ThreadSafeClientConnManager(httpParams,schemeRegistry);
HttpClient httpClient = new DefaultHttpClient(cm, httpParams);
httpClient.getParams().setParameter("http.conn-manager.timeout", 10000L);
httpClient.getParams().setParameter("http.protocol.wait-for-continue", 10000L);
I'm also using this code in a thread to clean up expired connections as mentioned in the docs
final Runnable cleanUp = new Runnable() {
public void run() {
cm.closeExpiredConnections();
// Optionally, close connections
// that have been idle longer than 30 sec
cm.closeIdleConnections(30, TimeUnit.SECONDS);
}
};
UPDATE: I ran visual VM for an hour or so and here's the memory graph on the remote process, the memory is now used up