Hi
Currently I need a program that given a URL, returns a list of all the images on the webpage.
ie:
logo.png gallery1.jpg test.gif
Is there any open source software available before I try and code something?
Language should be java. Thanks Philip
Hi
Currently I need a program that given a URL, returns a list of all the images on the webpage.
ie:
logo.png gallery1.jpg test.gif
Is there any open source software available before I try and code something?
Language should be java. Thanks Philip
Just use a simple HTML parser, like jTidy, and then get all elements by tag name img and then collect the src attribute of each in a List<String> or maybe List<URI>.
You can obtain an InputStream of an URL using URL#openStream() and then feed it to any HTML parser you like to use. Here's a kickoff example:
InputStream input = new URL("http://www.stackoverflow.com").openStream();
Document document = new Tidy().parseDOM(input, null);
NodeList imgs = document.getElementsByTagName("img");
List<String> srcs = new ArrayList<String>();
for (int i = 0; i < imgs.getLength(); i++) {
srcs.add(imgs.item(i).getAttributes().getNamedItem("src").getNodeValue());
}
for (String src: srcs) {
System.out.println(src);
}
I must however admit that HtmlUnit as suggested by Bozho indeed looks better.
HtmlUnit has HtmlPage.getElementsByTagName("img"), which will probably suit you.
(read the short Get started guide to see how to obtain the correct HtmlPage object)
This is dead simple with HTML Parser (and any other decent HTML parser):
Parser parser = new Parser("http://www.yahoo.com/");
NodeList list = parser.parse(new TagNameFilter("IMG"));
for ( SimpleNodeIterator iterator = list.elements(); iterator.hasMoreNodes(); ) {
Tag tag = (Tag) iterator.nextNode();
System.out.println(tag.getAttribute("src"));
}