ansaurus

Question

Answer 1

+1 A:

HTTPUnit is for unit testing. Unless you mean "testing client", I don't think it's appropriate for creating an application.

I wand to develop http client in Java

You realize, of course, that the Apache HTTP client is not your answer either. You sound like you want to create a first web app.

You'll need servlets and JSPs. Get Apache's Tomcat and learn enough JSP and JSTL to do what you need to do. Don't bother with frameworks, since it's your first.

When you have it running, then try a framework like Spring.

duffymo 2010-09-12 12:02:38

The question seems to be quite clearly client-side. Servlets and JSPs aren't relevant for the client-side functionality.

lexicore 2010-09-12 12:07:07

Doesn't sound like jorik1000 is trying to develop a server-side application, but rather a specialised web client that scrapes and submits information. HttpUnit is designed to make unit testing of web pages easy, but as a consequence it's also a good tool for working with a web page at a high level to general stuff like pulling out information and filling in forms.

isme 2010-09-12 12:27:33

JSPs aren't client side?

duffymo 2010-09-12 13:18:55

JSP (Java ServerPages) is a server-side technology like PHP and Perl. A client only ever sees the result of the server's processing of JSP directives.

isme 2010-09-12 16:53:34

I realize that they're compiled into servlets that run on the server side, but the fact that the client "sees" the result sure has client side flavor to me.

duffymo 2010-09-12 18:13:01

Answer 2

+1 A:

It seems to be a cURL support for java :
http://curl.haxx.se/libcurl/java/

Vitalyson 2010-09-12 12:06:19

I like cURL, but why depend on a native C library when there's a pure Java library such as Apache HTTPClient?

R. Kettelerij 2010-09-12 12:10:04

Answer 3

+1 A:

Depends on how complex your websites are. Options are Apache HttpClient (plus something like JTidy) or testing-oriented packages like HtmlUnit or Canoo WebTest. HtmlUnit is quite powerful - you'd be able to process JavaScript, for instance.

lexicore 2010-09-12 12:12:20

+1 for pointing out [Canoo WebTest](http://webtest.canoo.com/webtest/manual/WebTestHome.html). It's new to me. But it looks like it's designed more specifically for testing pages, and not suitable for general page manipulation and data extraction. How does it compare to HtmlUnit?

isme 2010-09-12 12:31:17

Answer 4

+2 A:

It sounds like you are trying to create a web-scraping application. For this purpose, I recommend the HtmlUnit library.

It makes it easy to work with forms, proxies, and data embedded in web pages. Under the hood I think it uses Apache's HttpClient to handle HTTP requests, but this is probably too low-level for you to be worried about.

With this library you can control a web page in Java the same way you would control it in a web browser: clicking a button, typing text, selecting values.

Here are some examples from HtmlUnit's getting started page:

Submitting a form:

@Test
public void submittingForm() throws Exception {
    final WebClient webClient = new WebClient();

    // Get the first page
    final HtmlPage page1 = webClient.getPage("http://some_url");

    // Get the form that we are dealing with and within that form, 
    // find the submit button and the field that we want to change.
    final HtmlForm form = page1.getFormByName("myform");

    final HtmlSubmitInput button = form.getInputByName("submitbutton");
    final HtmlTextInput textField = form.getInputByName("userid");

    // Change the value of the text field
    textField.setValueAttribute("root");

    // Now submit the form by clicking the button and get back the second page.
    final HtmlPage page2 = button.click();

    webClient.closeAllWindows();
}

Using a proxy server:

@Test
public void homePage_proxy() throws Exception {
    final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_2, "http://myproxyserver", myProxyPort);

    //set proxy username and password 
    final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider();
    credentialsProvider.addProxyCredentials("username", "password");

    final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
    assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());

    webClient.closeAllWindows();
}

The WebClient class is single threaded, so every thread that deals with a web page will need its own WebClient instance.

Unless you need to process Javascript or CSS, you can also disable these when you create the client:

WebClient client = new WebClient();
client.setJavaScriptEnabled(false);
client.setCssEnabled(false);

isme 2010-09-12 12:14:40

Answer 5

A:

Jetty has a nice client side library. I like to use it because I often need to create a server along with the client. The Apache HTTP Client is really good and seems to have some more features that work like being able to resolve a proxy using SSL.

Joshua 2010-09-12 12:28:44

Answer 6

+1 A:

HTTPUnit is meant for testing purposes, I don't think it is best suited to be embedded inside your application.

When you want to consume HTTP resources (like webpages) I'd recommend Apache HTTPClient. But you may find this framework to low level for your use case which is webpage scraping. So I'd recommend an integration framework like Apache Camel for this purpose. For example the following route reads a webpage (using Apache HTTPClient), transforms the HTML to well-formed HTML (using TagSoup) and transforms the result to a XML representation for further processing.

from("http://mycollege.edu/somepage.html).unmarshall().tidyMarkup().to("xslt:mystylesheet.xsl")

You can further process the resulting XML using XPath or transform it to a POJO using JAXB for example.

R. Kettelerij 2010-09-12 12:30:42

I use HtmlUnit because it's easy. I can pull out the info i need from a page by XPath and then run away. What you are suggesting sounds like overkill. Why do you recommend this way? What's wrong with using HtmlUnit in an application?

isme 2010-09-12 13:17:25

+1 for mentioning the HttpClient + TagSoup combo. When I rolled my own scraping library, these worked great together, and were faster than the full-fat HtmlUnit.

isme 2010-09-12 13:20:13

Note the 'Unit' part, these libraries are primarily focused on (unit)testing. Nevertheless I've removed the reference to HTMLUnit since it provides more general scraping functions.

R. Kettelerij 2010-09-12 13:48:25

I would say that unit-testing a web site is a use case of web scraping. Both HttpUnit and HtmlUnit make it easy to scrape sites for information. I confess I haven't used HttpUnit, but their [unit-testing howto](http://httpunit.sourceforge.net/doc/cookbook.html) reads as a scraping howto just as well. As I understand it, HtmlUnit has better DOM support (through the magic `getByXPath` method), but HttpUnit exposes more HTTP concepts like the raw requests and responses. Wether the HTTP stuff is useful depends on the site you're trying to scrape.

isme 2010-09-12 16:41:45

Answer 7

A:

If you really want to simulate a browser, then Selenium RC

flybywire 2010-09-12 12:36:12

ansaurus

tags:

views:

answers:

Best HTTP library for Java?

related questions