views:

105

answers:

3

I want to access forms on HTMl pages throught Java Programming Language without involving real browser in between.

At present I am doing it through HTML UNIT but it takes a bit more time to load a page. When it comes to accessing millions of page, then this extra bit time matters most.

Is there any other methods for doing this?

A: 

Accessing a web page using a browser, even HtmlUnit, is going to be slow. A better method is to test the layer just below the web interface, so that you don't need to access millions of pages -- instead you test enough to make sure that the web interface is using the lower layer correctly.

Michael Williamson
A: 

Most of the interaction in browser comes down to an HTTP GET or an HTTP POST. You need to figure out exactly the operation you need, and then you can construct the URL and/or form data. Then you can use something like this:

   try { 
    //Construct data 
    String data = URLEncoder.encode("key1", "UTF-8") + "=" + URLEncoder.encode("value1", "UTF-8"); data += "&" + URLEncoder.encode("key2", "UTF-8") + "=" + URLEncoder.encode("value2", "UTF-8"); 
    // Send data 
    URL url = new URL("http://hostname:80/cgi"); 
    URLConnection conn = url.openConnection(); conn.setDoOutput(true); 
    OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream()); 
    wr.write(data); 
    wr.flush(); 

    // Get the response 
    BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream())); 
    String line; while ((line = rd.readLine()) != null) { 

    // Process line... } 
    wr.close(); 
    rd.close(); 
    } catch (Exception e) { } 
nont
+2  A: 

I've used something similar called httpunit before, but I have no idea how it compares performance wise.

If you have millions of pages to process, I would recommend throwing some more threads at it. Just a guess, but I think that if you scale this up to multiple threads, you'll run out of bandwidth before you run out of CPU power (in which case it won't matter how much faster it could be)

Eric Petroelje