views:

63

answers:

2

hey, I am starting a project and wonder the relationship between the characters in images and the whole web page where the images reside. so first, i want to crawl some images and their web pages.....need to save the crawl result in local disk for further analysis.

I wonder if there is any open source for this issue?

thx^_^

+1  A: 

Here's a list of open source crawlers http://www.google.co.uk/#hl=en&source=hp&q=open+source+web+crawler&aq=f&aqi=g9g-m1&aql=&oq=&gs_rfai=&fp=77130048d7e0701a

Near top of the list are Java crawlers, and the Wikipedia article has some more as well

Tom Gullen
A: 

You can use crawler4j for this purpose. It is a simple java crawler that can be configured in a few minutes and you can use it for crawling images as well. You can also find an ImageCrawler example in the source codes.

Yasser