It's probably a bad idea, but you could use an HTTP library like HttpClient to make the same requests the user would. You will need to use a header-watching utility to figure out what URLs you need to hit and what headers you need to use, but it's possible to automate this. If google ever changes their page layout, id
s, class
es, or overall page structure, your parsing code will break.
Furthermore, you will also need to be able to capture the return response from the server, which would involve receiving the response at a certain web endpoint. This could be addressed with solution 2, outlined below.
In summary.
Solution 1 - Authenticating
- HttpClient GET against authentication URL.
- TagSoup to parse page response, store any data (if any) that's required from the page.
- XOM xml parser to work with the response from [1.2], if needed.
- HttpClient requests against authorization URLs
Solution 2 - Receiving response from google
- Launch a Jetty server.
- Set your response URL to
localhost:****/whatever
when authenticating.
- Accept response in Jetty. Retrieve response in command line application.
Disclaimer:
This is all untested and highly theoretical. There may be a better way to do this, but it's a lot of work to avoid just opening a web browser and letting the user login.