views:

51

answers:

2

hi all, I'm just getting started writing a simple web crawler to get info on links we have coming in to our system. I'm using httpclient 4.x. I have about 100 threads running fetching links and doing head requests on them, it works great for the first few hours then it slows to a screeching crawl. I'm not sure if I'm setting up the connection manager properly or not.

here is the code I have to create an httpclient object. Anyone see anything that would raise an alarm with this code block? When I stop the server and restart it everything is as good as new again. During the phase when it's running slow the memory still looks ok at a steady 500K per process so it doesn't look like I'm leaking memory.

HttpParams httpParams = new BasicHttpParams();
HttpConnectionParams.setConnectionTimeout(httpParams, 5000);
HttpConnectionParams.setSoTimeout(httpParams, 5000);
ConnManagerParams.setMaxTotalConnections(httpParams, 200);
HttpProtocolParams.setVersion(httpParams, HttpVersion.HTTP_1_1);

// set request params

httpParams.setParameter("http.protocol.cookie-policy", CookiePolicy.BROWSER_COMPATIBILITY);
httpParams.setParameter("http.useragent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)");


SchemeRegistry schemeRegistry = new SchemeRegistry();
schemeRegistry.register(new Scheme("http", PlainSocketFactory.getSocketFactory(), 80));
schemeRegistry.register(new Scheme("https", PlainSocketFactory.getSocketFactory(), 443));

final ClientConnectionManager cm = new ThreadSafeClientConnManager(httpParams,schemeRegistry);

HttpClient httpClient = new DefaultHttpClient(cm, httpParams);

httpClient.getParams().setParameter("http.conn-manager.timeout", 10000L);
httpClient.getParams().setParameter("http.protocol.wait-for-continue", 10000L);

I'm also using this code in a thread to clean up expired connections as mentioned in the docs

final Runnable cleanUp = new Runnable() {
      public void run() { 

        cm.closeExpiredConnections();
        // Optionally, close connections
        // that have been idle longer than 30 sec
        cm.closeIdleConnections(30, TimeUnit.SECONDS);

      }
     };

UPDATE: I ran visual VM for an hour or so and here's the memory graph on the remote process, the memory is now used up

http://img64.imageshack.us/f/screenshot20100714at204.png/

+1  A: 

Use VisualVM (it also comes with JDK) and monitor your application for a while with JMX. Also install Visual GC plugin, it offers an inside of what happens with your GC(which might slow down the application a lot if there is not enough memory).

When it slows down, look at the Threads tab to see how it looks when it comes to locking. Locking or not enough memory(memory leaks) should be the problem in your case.

If you want to go deeper, I would recommend you YourKit Java Profiler.

adrian.tarau
A: 

I would also try tweaking the thread count to see if that makes any difference.

Shikhar