tags:

views:

41

answers:

2

I want to use XPath (in Java) to parse XML files. However these XML files are only available on the web (downloading them all manually is not an option (of course they have to be "downloaded" in order to be processed).

So basically my question is how to do I take a URI object and convert it into a File object. Do I need to use SCP or something in between to download the file. Any code, tutorials or just general advice would be much appreciated.

I had tried this:

    URI uri = new URI("http://www.somefiles.com/myfile.xml");
    InputStream is = uri.toURL().openStream();
    File xmlDocument = new File(uri);

But this gave a URI scheme is not "file" error.

+2  A: 

You can make it more complicated, but this could be as simple as opening a stream from a URL.

InputStream in = remoteURI.toURL().openStream();

Now this is not a File object as originally requested, but I'm guessing your XPath library can process a generic InputStream. If not you'll have to save the InputStream above into a temp file and create a File object on that.

Chadwick
That gives a "URI scheme is not "file"" eror
Ankur
You still passed the `uri` to the File constructor. You need to make a File using the InputStream `is` instead.
Andy
Yes thanks Andy
Ankur
+1  A: 

Try writing the XML to disk first:

File tempDir = new File(System.getProperty("java.io.tmpdir"));
File xmlDocument = new File(tempDir, "theXml.xml");
InputStream in = remoteURI.toURL().openStream();
OutputStream out = new FileOutputStream(xmlDocument);
int read;
while ((read = in.read()) != -1){
  out.write(read);
}
in.close();
out.close();

However, if you just need to pull some data from the XML using XPath, you don't have to write anything to disk:

InputStream in = remoteURI.toURL().openStream();
StreamSource source = new StreamSource(in);
DOMResult result = new DOMResult();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(source, result);
Document document = (Document)result.getNode();

XPath xpath = XPathFactory.newInstance().newXPath();
xpath.evaluate("...", document);
Michael Angstadt