Hi,
I am about to develop a crawler in Java but don't feel like reinventing the wheel. A quick Google search gives a whole bunch of Java libraries to build a web crawler. Besides that Nutch is of course a very robust package but seems a bit too advanced for my needs. I only need to crawl a handful websites a week containing a couple of 1000 pages each.
Which open source Java library would you recommend considering:
- speed
- multithreading (or even distributed)
- extending it with new functionality
- active maintained
- and documentation?