ansaurus

Question

Get all Images from WebPage Program | Java

Answer 1

A:

You can use wget that has a lot of options available.

Or google for java wget ...

PeterMmm 2010-01-31 18:21:08

Answer 2

+5 A:

Just use a simple HTML parser, like jTidy, and then get all elements by tag name img and then collect the src attribute of each in a List<String> or maybe List<URI>.

You can obtain an InputStream of an URL using URL#openStream() and then feed it to any HTML parser you like to use. Here's a kickoff example:

InputStream input = new URL("http://www.stackoverflow.com").openStream();
Document document = new Tidy().parseDOM(input, null);
NodeList imgs = document.getElementsByTagName("img");
List<String> srcs = new ArrayList<String>();

for (int i = 0; i < imgs.getLength(); i++) {
    srcs.add(imgs.item(i).getAttributes().getNamedItem("src").getNodeValue());
}

for (String src: srcs) {
    System.out.println(src);
}

I must however admit that HtmlUnit as suggested by Bozho indeed looks better.

BalusC 2010-01-31 18:21:56

and htmlunit is doing roughly what your answer tells, so +1 for clarifying what exactly should happen.

Bozho 2010-01-31 18:48:58

HtmlUnit is however less bloated than jTidy. It offers *builtin* ways to open a webpage and obtaining elements/attributes of interest using XPath.

BalusC 2010-01-31 19:28:33

Answer 3

+3 A:

HtmlUnit has HtmlPage.getElementsByTagName("img"), which will probably suit you.

(read the short Get started guide to see how to obtain the correct HtmlPage object)

Bozho 2010-01-31 18:23:24

Answer 4

A:

You can parse the HTML, and collect all SRC attributes of IMG elements in a Collection. Then download each resource from each url and write it to a file. For parsing there are several HTML parsers available, Cobra is one of them.

craftsman 2010-01-31 18:24:08

Answer 5

+1 A:

This is dead simple with HTML Parser (and any other decent HTML parser):

Parser parser = new Parser("http://www.yahoo.com/");
NodeList list = parser.parse(new TagNameFilter("IMG"));

for ( SimpleNodeIterator iterator = list.elements(); iterator.hasMoreNodes(); ) {
    Tag tag = (Tag) iterator.nextNode();
    System.out.println(tag.getAttribute("src"));
}

Pascal Thivent 2010-01-31 18:52:45

ansaurus

tags:

views:

answers:

Get all Images from WebPage Program | Java

related questions