Use HtmlUnit if you want
- FAST
- SIMPLE
java based web interaction/crawling.
For example: here is some simple code showing a bunch of output and an example of accessing all IMG elements of the loaded page.
public class HtmlUnitTest {
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("http://www.google.com");
System.out.println(page.getTitleText());
for (HtmlElement node : page.getHtmlElementDescendants()) {
if (node.getTagName().toUpperCase().equals("IMG")) {
System.out.println("NAME: " + node.getTagName());
System.out.println("WIDTH:" + node.getAttribute("width"));
System.out.println("HEIGHT:" + node.getAttribute("height"));
System.out.println("TEXT: " + node.asText());
System.out.println("XMl: " + node.asXml());
}
}
}
}
Example #2 Accessing named input fields and entering data/clicking:
final HtmlPage page = webClient.getPage("http://www.google.com");
HtmlElement inputField = page.getElementByName("q");
inputField.type("Example input");
HtmlElement btnG = page.getElementByName("btnG");
Page secondPage = btnG.click();
if (secondPage instanceof HtmlPage) {
System.out.println(page.getTitleText());
System.out.println(((HtmlPage)secondPage).getTitleText());
}
NB: You can use page.refresh() on any Page object.