As far as I can tell, you must have a different URLConnection
for each URL (which makes sense as the underlying network connection must change as well). I seriously doubt that creating this object is your bottleneck; I suspect it is the network time, but without profile it is hard to know for certain.
For a moderate amount of pages, I would consider a working queue (say using an ExecutorService
). For a large number of pages, I might even look into a Java version of Map/Reduce.
Edit: For Map/Reduce to be better than a simple worker queue, you need to have multiple computers available to do the scraping.