tags:

views:

23

answers:

1

hi all, I'm playing around with a webpage fetcher in java right now and I'm curious what the best way to do this in Java is.

I have a link: e.g.: http://www.nytimes.com/2010/07/08/technology/personaltech/08pogue.html?ref=technology

and when I crawl that page I might find img src paths like

"../public/images/header.jpg"
"../../test/logo.gif"

where it may be relative to one of the subdirs the file is in

Question is.. in Java is there a lib that would be able to turn these into absolute paths like

http://www.nytimes.com/2010/07/08/technology/public/images/header.jpg

?

thanks

+2  A: 

The URL class should be able to do this, see: http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/api/java/net/URL.html#URL%28java.net.URL,%20java.lang.String%29

Eg:

URL pageURL = new URL("http://www.nytimes.com/2010/07/08/technology/personaltech/08pogue.html?ref=technology");
URL imageURL = new URL(pageURL, "../public/images/header.jpg");

Warning: not tested this

fd
awesome, worked perfect! thanks