Hi,
My goal is to connect to an OWA page (Microsoft Office Outlook Web Access - basically an email client) and log-in, then read the new page loaded and find the inbox count.
To login, I need to fill the username and the password fields and call a certain javascript function for which I know the name and header.
How do I:
- Get the DOM for the page?
- Update the DOM to fill out the input text fields?
- Call that Javascript function?
- Get the new URL for the page I am redirected to?
So far I am able to connect to a webpage and load its page source using the following Java code:
// open the connection to the welcome page
callback.status("Opening connection...");
URLConnection connection = null;
try
{
connection = url.openConnection();
}
catch(IOException ex)
{
throw new Exception("I/O Problem while attempting URL connection");
}
connection.setDoInput(true);
// open input stream to read website
callback.status("Opening data stream...");
InputStream input = null;
try
{
input = connection.getInputStream();
}
catch(IOException ex)
{
throw new Exception("I/O Problem while opening data stream");
}
// read website contents
callback.status("Reading site...");
String content = "";
byte[] buffer = new byte[100];
int totalBytesRead = 0;
int bytesRead = 0;
try
{
while((bytesRead = input.read(buffer)) != -1)
{
String newContent = new String(buffer, 0, bytesRead);
content += newContent;
}
}
catch(IOException ex)
{
throw new Exception("I/O Problem while reading website");
}
System.out.println(content);
The result is the entire page source being output to the console - great. I also attempted to parse the page to get a DOM object which I can then follow to find my username and password fields:
XMLParserConfiguration config = new XML11DTDConfiguration();
DOMParser parser = new DOMParser(config);
InputSource inputSource = new InputSource(input);
inputSource.setByteStream(input);
try
{
parser.parse(inputSource);
}
catch(SAXParseException ex)
{
}
Document document = parser.getDocument();
visitNode(document, 0);
But I am getting a SAXParseException: :6:62: White spaces are required between publicId and systemId.
Looks like this line is to blame:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
So I may need to change that DOMParser's configuration somehow to be lenient enough and "forgive" the white space requirement.