views:

113

answers:

1

I cannot, for the life of me, rig HtmlUnit up to grab this site:

http://www.bing.com/travel/flight/flightSearch?form=FORMTRVLGENERIC&q=flights+from+SLC+to+BKK+leave+07%2F30%2F2010+return+08%2F11%2F2010+adults%3A1+class%3ACOACH&stoc=0&vo1=Salt+Lake+City%2C+UT+%28SLC%29+-+Salt+Lake+City+International+Airport&o=SLC&ve1=Bangkok%2C+Thailand+%28BKK%29+-+Suvarnabhumi+International&e=BKK&d1=07%2F30%2F2010&r1=08%2F11%2F2010&p=1&b=COACH&baf=true

I'm sure it has to do with the vast amounts of scripts running in the background. Perhaps these scripts aren't being given enough time to fully load?

I've also tried simply grabbing bing.com/travel, and no success either. It's breaking on the getPage function of the new HtmlPage client.

The output gives a plethora of runtimeErrors ("data necessary to complete this operation is not yet available"), all for the same sourceName ("http://www.bing.com/travel/jsxc.vjs?a=common&v=5.5.0-1278007084280")

Then a couple exceptions thrown for a missing "(" in a couple scripts on bing.com.

Then it calls javascript, then abruptly ends.

I realize this could be a handful of problems that others might not be able to see, and so if there are no suggestions, would someone mind pumping these two sites through a test implementation of their own HtmlUnit use and see if they can get basic output of the XML or text results? I'm not trying to do anything fancy here, just get some basic text or XML output of the results.

It'd be handy to know if someone else's implementation works so I can keep jury-rigging mine to completion.

CODE:

import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.WebClient;

public class test {

public static void main(String[] args) throws Exception {

        WebClient client = new WebClient();
        System.out.println("webclient loaded");

        HtmlPage currentPage = client.getPage("http://www.bing.com/travel/flight/flightSearch?form=FORMTRVLGENERIC&q=flights+from+SLC+to+BKK+leave+07%2F30%2F2010+return+08%2F11%2F2010+adults%3A1+class%3ACOACH&stoc=0&vo1=Salt+Lake+City%2C+UT+%28SLC%29+-+Salt+Lake+City+International+Airport&o=SLC&ve1=Bangkok%2C+Thailand+%28BKK%29+-+Suvarnabhumi+International&e=BKK&d1=07%2F30%2F2010&r1=08%2F11%2F2010&p=1&b=COACH&baf=true");
        client.waitForBackgroundJavaScript(10000);
        System.out.println("htmlpage init'd");

        //System.out.println(currentPage.getTitleText());
        String textSource = currentPage.asXml();
        System.out.println(textSource);

}

}

Thanks!

+1  A: 

Try adding this:

client.setThrowExceptionOnScriptError( false ) ;

It takes a long time to run, and boy does it spew out logging... but eventually a page came out:

htmlpage init'd
<?xml version="1.0" encoding="utf-8"?>
<html id="">
  <head>
   ...
Rodney Gitzel
well son of a gun... thanks!so is it worth going through to fix the errors and warnings? as long as I get a page out, maybe it's not worth the effort...
Stu Kalide
From what I recall a lot of it was just logging info. That's typical of my HtmlUnit tests, the console spews like crazy.If the page comes out, don't worry about it.
Rodney Gitzel