views:

2736

answers:

3

I have files on my file system, on Windows XP. I want to parse them using Java (JRE 1.6).

Problem is, I don't understand how Java and Xerces work together when the file path has spaces in it.

If the file has no spaces in its path, all works fine.

If there are spaces, I may have this kind of trouble, even if I call the parser with a FileInputStream instance :

java.net.UnknownHostException: .
    at java.net.PlainSocketImpl.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at sun.net.NetworkClient.doConnect(Unknown Source)
    at sun.net.NetworkClient.openServer(Unknown Source)
    at sun.net.ftp.FtpClient.openServer(Unknown Source)
    at sun.net.ftp.FtpClient.openServer(Unknown Source)
    at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
    at sun.net.www.protocol.ftp.FtpURLConnection.getInputStream(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)

(sun.net.ftp.FtpClient.openServer ??? Wtf ?)

or else this kind of trouble :

java.net.MalformedURLException: unknown protocol: d
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)

(It says unknown protocol: d because, I guess, the file is on the D drive.)

Has anyone any clue of why that happens, and how to circumvent the problem ? I tried to supply my own EntityResolver but my log tells me it is not even called before the crash.


EDIT:

Here is the code calling the parser.

public Document fileToDom(File file) throws ProcessException {
 Document doc = null;
 try {
  DocumentBuilderFactory db = DocumentBuilderFactory.newInstance();
  DocumentBuilder builder = db.newDocumentBuilder();
  if (this.errorHandler!=null){
   builder.setErrorHandler(this.errorHandler);}
  else {
   builder.setErrorHandler(new DefaultHandler());
  }
  FileInputStream test= new FileInputStream(file);
  doc = builder.parse(test);
  ...
 } catch (Exception e) {...}
 ...
}


For the moment I find myself forced to remove the DOCTYPE before the parse, which removes all the problems, and the DTD validation... Not so great a solution.

+1  A: 

Try this URI style:

file:///d:/folder/folder%20with%20space/file.xml

victor hugo
+2  A: 

Are you just using DocumentBuilder.parse(filename)?

If so, that's failing because it expects a URI. Open a FileInputStream to the file, and then pass that to DocumentBuilder.parse(InputStream).

Jon Skeet
I am using DocumentBuilder.parse(InputStream).
subtenante
A: 

It looks like it's trying to connect to a URL in the doctype header so it can download it in order to validate the document against the downloaded DTD.

Dan Fleet