views:

46

answers:

2

Hi,

My goal is to connect to an OWA page (Microsoft Office Outlook Web Access - basically an email client) and log-in, then read the new page loaded and find the inbox count.

To login, I need to fill the username and the password fields and call a certain javascript function for which I know the name and header.

How do I:

  1. Get the DOM for the page?
  2. Update the DOM to fill out the input text fields?
  3. Call that Javascript function?
  4. Get the new URL for the page I am redirected to?

So far I am able to connect to a webpage and load its page source using the following Java code:

                // open the connection to the welcome page
                callback.status("Opening connection...");
                URLConnection connection = null;
                try
                {
                    connection = url.openConnection();
                }
                catch(IOException ex)
                {
                    throw new Exception("I/O Problem while attempting URL connection");
                }

                connection.setDoInput(true);

                // open input stream to read website
                callback.status("Opening data stream...");
                InputStream input = null;
                try
                {
                    input = connection.getInputStream();
                }
                catch(IOException ex)
                {
                    throw new Exception("I/O Problem while opening data stream");
                }

                // read website contents
                callback.status("Reading site...");

                String content = "";
                byte[] buffer = new byte[100];
                int totalBytesRead = 0;
                int bytesRead = 0;
                try
                {
                    while((bytesRead = input.read(buffer)) != -1)
                    {
                        String newContent = new String(buffer, 0, bytesRead);
                        content += newContent;
                    }
                }
                catch(IOException ex)
                {
                    throw new Exception("I/O Problem while reading website");
                }

                System.out.println(content);

The result is the entire page source being output to the console - great. I also attempted to parse the page to get a DOM object which I can then follow to find my username and password fields:

                XMLParserConfiguration config = new XML11DTDConfiguration();
                DOMParser parser = new DOMParser(config);
                InputSource inputSource = new InputSource(input);
                inputSource.setByteStream(input);
                try
                {
                    parser.parse(inputSource);
                }
                catch(SAXParseException ex)
                {

                }
                Document document = parser.getDocument();
                visitNode(document, 0);

But I am getting a SAXParseException: :6:62: White spaces are required between publicId and systemId.

Looks like this line is to blame:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

So I may need to change that DOMParser's configuration somehow to be lenient enough and "forgive" the white space requirement.

+2  A: 

So you want to act like a GUI-less webbrowser programmaticaly? Use HtmlUnit, that's exactly what it advertises itself with.

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.

It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating either Firefox or Internet Explorer depending on the configuration you want to use.

It is typically used for testing purposes or to retrieve information from web sites.

See also:

BalusC
I've looked at this, but it looks like it's over-kill.Plus I get a bunch of exceptions when it parses OWA's javascript code - which I can go around by disabling the javascript, but that's kind of self-contradicting with the question in mind.
Warlax
Exceptions contain information about the cause of the problem. Ignoring them doesn't help us much to help you explaining the cause of the problem. You know, once the cause is *understood*, the solution is nothing more than obvious :)
BalusC
A: 

I've posted a more specific question here:

http://stackoverflow.com/questions/3283785/sending-an-owa-logon-form-from-java

Warlax
This isn't an answer. Rather post it as a comment on your own question or just leave it away.
BalusC