views:

571

answers:

4

I have tried several ways to login to a website through java. I have used watij, HTMLunit etc. but due to not so familiar with any of these, I am not able to login successfully.

Can anyone tell me in detail how to login through java

To be more specific, I want to login to the ORKUT and want the pagesource of the page that comes after login.

A: 

Why are you trying to login via Java, why not just use cURL? Is there something specific you're trying to accomplish?

Homework
I want to process pagesource of various pages after login. Is it possible to login through cURL and get the pagesource in a java program so that I can process that document and then pass next url to cURL to get the pagesource of nextpage.
Yatendra Goel
Yes, it's possible.
Homework
A: 

Orkut uses Google auth to login. My suggestion is to use an HTTP debugger like Fiddler to watch the traffic during login. Probably, there are cookies and redirects that you need to replicate.

Generally,

  1. Look at the login form, get the names of the name and password field and the action that the form posts to
  2. Create a POST request to the action URL and pass in the name and password correctly (e.g. name=username&password=pwd)
  3. Was this HTTPS (make sure to do that correctly)
  4. If the response has a SET-COOKIE in the header, make sure to send that cookie on all subsequent requests
  5. If the response has a redirect, then do a GET for the redirect, sending cookies if appropriate
  6. (keep looping on #5 until you don't get a redirect)

The response you get at the end of this is the page source.

Take a look at this:

http://code.google.com/apis/gdata/javadoc/com/google/gdata/client/http/AuthSubUtil.html http://code.google.com/p/apex-google-data/source/browse/trunk/google%5Fdata%5Ftoolkit/src/classes/AuthSubUtil.cls

Looks like google code for authenticating with their services.

Lou Franco
+2  A: 

The answer depends on how the website attempts to authenticate you:

  • Do you have to set a username and password in the HTTP headers (basic auth)?
  • Or do you have to fill out and submit a form containing the username and password?

For either I would recommend commons-httpclient, although the latter screen-scraping approach is always messy to do programatically.

For basic authentication, take a look at httpclient's Authentication Guide.

For forms authentication, you'll need to check the HTML source of the page to understand

  • The URL the form is submitted to
  • What the names of the parameters to submit are

For help on how to submit a form in httpclient, take a look at the documentation on the POST method.

The httpclient site also contains a basic tutorial.

matt b
A: 

Your best chances to do such things & survive in the real world web are with Selenium-RC.

Basically, what you will do is to remote-control your browser to do anything that you can do manually (except file uploads).

Many times, I have used this pattern:

  1. Login with selenium
  2. Take the cookies
  3. Continue with my favourite HTTP library.
flybywire
What stops you from taking the cookie from your browser?
Geo