I am using URL.openStream()
to download many html pages for a crawler that I am writing. The method runs great locally on my mac however on my schools unix server the method is extremely slow. But only when downloading the first page.
Here is the method that downloads the page:
public static String download(URL url) throws IOException {
Long start = System.currentTimeMillis();
InputStream is = url.openStream();
System.out.println("\t\tCreated 'is' in "+((System.currentTimeMillis()-start)/(1000.0*60))+"minutes");
...
}
And the main method that invokes it:
LinkedList<URL> ll = new LinkedList<URL>();
ll.add(new URL("http://sheldonbrown.org/bicycle.html"));
ll.add(new URL("http://www.trentobike.org/nongeo/index.html"));
ll.add(new URL("http://www.trentobike.org/byauthor/index.html"));
ll.add(new URL("http://www.myra-simon.com/bike/travel/index.html"));
for (URL tmp : ll) {
System.out.println();
System.out.println(tmp);
CrawlerTools.download(tmp);
}
Output locally (Note: all are fast):
http://sheldonbrown.org/bicycle.html Created 'is' in 0.00475minutes
http://www.trentobike.org/nongeo/index.html Created 'is' in 0.005083333333333333minutes
http://www.trentobike.org/byauthor/index.html Created 'is' in 0.0023833333333333332minutes
http://www.myra-simon.com/bike/travel/index.html Created 'is' in 0.00405minutes
Output on School Machine Server (Note: All are fast except the first one. The first one is slow regardless of what the first site is):
http://sheldonbrown.org/bicycle.html Created 'is' in 3.2330666666666668minutes
http://www.trentobike.org/nongeo/index.html Created 'is' in 0.016416666666666666minutes
http://www.trentobike.org/byauthor/index.html Created 'is' in 0.0022166666666666667minutes
http://www.myra-simon.com/bike/travel/index.html Created 'is' in 0.009533333333333333minutes
I am not sure if this is a Java issue (*A problem in my Java code) or a server issue. What are my options?
When run on the server this is the output of the time command:
real 3m11.385s
user 0m0.277s
sys 0m0.113s
I am not sure if this is relevant... What should I do to try and isolate my problem..?