It sounds like you are trying to create a web-scraping application. For this purpose, I recommend the HtmlUnit library.
It makes it easy to work with forms, proxies, and data embedded in web pages. Under the hood I think it uses Apache's HttpClient to handle HTTP requests, but this is probably too low-level for you to be worried about.
With this library you can control a web page in Java the same way you would control it in a web browser: clicking a button, typing text, selecting values.
Here are some examples from HtmlUnit's getting started page:
Submitting a form:
@Test
public void submittingForm() throws Exception {
final WebClient webClient = new WebClient();
// Get the first page
final HtmlPage page1 = webClient.getPage("http://some_url");
// Get the form that we are dealing with and within that form,
// find the submit button and the field that we want to change.
final HtmlForm form = page1.getFormByName("myform");
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("userid");
// Change the value of the text field
textField.setValueAttribute("root");
// Now submit the form by clicking the button and get back the second page.
final HtmlPage page2 = button.click();
webClient.closeAllWindows();
}
Using a proxy server:
@Test
public void homePage_proxy() throws Exception {
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_2, "http://myproxyserver", myProxyPort);
//set proxy username and password
final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider();
credentialsProvider.addProxyCredentials("username", "password");
final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());
webClient.closeAllWindows();
}
The WebClient
class is single threaded, so every thread that deals with a web page will need its own WebClient
instance.
Unless you need to process Javascript or CSS, you can also disable these when you create the client:
WebClient client = new WebClient();
client.setJavaScriptEnabled(false);
client.setCssEnabled(false);