I'd like to fetch a web page including images, flash animations and other embedded objects. What's a straightforward way of achieving this?
+2
A:
Writing a web-crawler in the java programming language. http://java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/
giri
2010-04-18 23:26:36
Actually, it would be simpler to choose an existing one. Hopefully someone will add an answer that lists some good alternatives.
Stephen C
2010-04-19 00:40:01
+1
A:
Use an open source HTML Parser such as HTMLCleaner - http://java-source.net/open-source/html-parsers/htmlcleaner or CyberNekoHtml - http://java-source.net/open-source/html-parsers/nekohtml.
Once you have used a parser to create a representation of the DOM of the web page, you can then load/download images and other embedded objects that exist in the DOM by performing queries on the DOM and extracting relevant src attributes from the HTML elements.
Finbarr
2010-04-19 00:42:46