views:

77

answers:

3

I am trying to use Java to submit a captcha to decaptcher.com. Decaptcher doesn't really do a good job of explaining how to use their API's, so I am trying to figure out how to use an HTTP POST request to submit a captcha. Here is the example code I got from their website:

<form 
 method="post" 
 action="http://poster.decaptcher.com/" 
 enctype="multipart/form-data">
 <input type="hidden" name="function"  value="picture2">
 <input type="text"   name="username"  value="client">
 <input type="text"   name="password"  value="qwerty">
 <input type="file"   name="pict">
 <input type="text"   name="pict_to"   value="0">
 <input type="text"   name="pict_type" value="0">
 <input type="submit" value="Send">
</form>

I am supposed to send a post request like that to the web server and get a string returned to me. Here is my attempt to implement that in Java.

public String getDecaptcherAnswer(String username, String password){
        try{
            URL decaptcherPostURL = new URL("http://poster.decaptcher.com/");
            WebRequestSettings request = new WebRequestSettings(decaptcherPostURL, HttpMethod.POST);
            request.setEncodingType(FormEncodingType.MULTIPART);
            ArrayList<NameValuePair> params = new ArrayList<NameValuePair>();
            params.add(new NameValuePair("function", "picture2"));
            params.add(new NameValuePair("username", username));
            params.add(new NameValuePair("password", password));

            //I added this block in 
            File file = new File("captcha.png");
            params.add(new KeyDataPair("pict", capFile, "png", "utf-8"));
            //----------------------

            params.add(new NameValuePair("pict_to", "0"));
            params.add(new NameValuePair("pict_type", "0"));
            request.setRequestParameters(params);
            request.setUrl(decaptcherPostURL);

            HtmlPage page = webClient.getPage(request);
            System.out.println(page.asText());
            System.out.println("--------------------------------------");
            System.out.println(page.asXml());

            return page.asText();
        }catch (Exception e){
            e.printStackTrace();
            return null;
        }
}

Am I supposed to set the value of pict to a File object instead of the String pointing to where the captcha is stored? (captcha.png is the name of the image I am trying to submit).

A: 

You should not use a NameValuePair for this but its subclass, KeyDataPair. This way you can specify a file to upload.

The following should work:

new KeyDataPair("pict", new File(fileName), "image/png", "utf-8");

The content type parameter is the MIME type of the file. Since you are uploading a PNG file, it should be image/png.

Ronald Wildenberg
Would I declare the KeyValuePair as:
Dylan
//Pretend I create a File object from "captcha.png" called file new KeyValuePair("pict", file, "png", "utf-8")Are PNG files encoded with UTF-8?
Dylan
I added an example that I think should work. I'm not sure about the utf-8 charset, maybe you should experiment a little with that.
Ronald Wildenberg
For the charset, you can use htmlPage.getPageEncoding();
Ahmed Ashour
I made the necessary changes but now I am getting a timeout error returned from the site. I think this means that my request is working properly, since I wasn't getting anything back before, but I don't know why the request is timing out.
Dylan
Also, @Ahmed, doesn't charset refer to the file encoding?
Dylan
@Dylan, the current implementation of HtmlUnit sends charset as set by the page, if you believe otherwise please submit a bug report with minimal test case in HtmlUnit tracker
Ahmed Ashour
If you create an html page with a form as described in the documentation (http://decaptcher.com/client/) and in your question, does it work then? If it does, there's still a difference between your code and the expected form data. If it doesn't, there is something wrong with the documentation.
Ronald Wildenberg
A: 

Here's what I was trying to type:

File file = new File("captcha.png");
params.add(new KeyDataPair("pict", capFile, "png", "utf-8"));

Are PNG files encoded with UTF-8? Is that how I would specify the KeyDataPair for the file input? I think I am either specifying the wrong contentType or the wrong charSet, or both. Am I supposed to put them in all caps?

Dylan
@Dylan: This shouldn't be a separate answer. This should be added in to your original question as an EDIT.
Justian Meyer
A: 

There is a higher-level mechanism to send that file, you don't need to create WebRequestSettings and set its individual values.

You should host that static html somewhere and do something like the below.

If you still have an issue, please submit a bug report in HtmlUnit bug tracker.

BTW, HtmlUnit 2.8 is about to be released, give it a try.

    WebClient webClient = new WebClient();
    HtmlPage page = webClient.getPage("http://some_host/test.html");
    HtmlForm form = page.getForms().get(0);
    form.getInputByName("username").setValueAttribute(username);
    form.getInputByName("password").setValueAttribute(password);
    form.getInputByName("pict_to").setValueAttribute("0");
    form.getInputByName("pict_type").setValueAttribute("0");
    form.getInputByName("pict").setValueAttribute("full_path_to_captcha_png");
    form.<HtmlFileInput>getInputByName("pict").setContentType("image/png");//optional
    HtmlPage page2 = form.getInputByValue("Send").click();
Ahmed Ashour
Are you sure this will work for a POST request?
Dylan
Yes, HtmlUnit is meant to take the headache from you :), please test with 2.8
Ahmed Ashour