tags:

views:

1355

answers:

2

The system I'm running on is Windows XP, with JRE 1.6.

I do this :

public static void main(String[] args) {
    try {
        System.out.println(new File("C:\\test a.xml").toURI().toURL());
    } catch (Exception e) {
        e.printStackTrace();
    }    
}

and I get that : file:/C:/test%20a.xml

How comes that the given URL has not two slashes before the C: ? I expected file://C:.... Is it a normal behaviour ?


EDIT :

From Java source code : java.net.URLStreamHandler.toExternalForm(URL)

    result.append(":");
    if (u.getAuthority() != null && u.getAuthority().length() > 0) {
        result.append("//");
        result.append(u.getAuthority());
    }

It seems that the Authority part of a file URL is null or empty, and thus the double slash is skipped. So what is the authority part of a URL and is it really absent from the file protocol ?

+1  A: 

As far as using it in a browser is concerned, it doesn't matter. I have typically seen file:///... but one, two or three '/' will all work. This makes me think (without looking at the java documentation) that it would be normal behavior.

Jeremy Cron
I understand 3 slashes : // + /C: which is logical. The problem is that I am not using a browser.
subtenante
3 Slashes makes sense on a UNIX-type system; the third slash is the root directory. file:///etc/passwd is the /etc/passwd file.
R. Bemrose
+2  A: 

That's an interesting question.

First things first: I get the same results on JRE6. I even get that when I lop off the toURL() part.

RFC2396 does not actually require two slashes. According to section 3:

The URI syntax is dependent upon the scheme. In general, absolute URI are written as follows:

<scheme>:<scheme-specific-part>

Having said that, RFC2396 has been superseded by RFC3986, which states

The generic URI syntax consists of a hierarchical sequence of omponents referred to as the scheme, authority, path, query, and fragment.

  URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

  hier-part   = "//" authority path-abempty
              / path-absolute
              / path-rootless
              / path-empty

The scheme and path components are required, though the path may be empty (no characters). When authority is present, the path must either be empty or begin with a slash ("/") character. When authority is not present, the path cannot begin with two slash characters ("//"). These restrictions result in five different ABNF rules for a path (Section 3.3), only one of which will match any given URI reference.

So, there you go. Since file URIs have no authority segment, they're forbidden from starting with //.

However, that RFC didn't come around until 2005, and Java references RFCRFC2396, so I don't know why it's following this convention, as file URLs before the new RFC have always had two slashes.

R. Bemrose
Yes but : http://tools.ietf.org/html/rfc1738 . Section 3.10 tells me files should have double slashes in URLs.
subtenante
And the example at the end of section 1.1 of RFC3986 has this example : file:///etc/hosts.
subtenante
I noticed that, too. Sometimes I think they should just make the specs easier to read.
R. Bemrose
Well, thanks anyways. :)
subtenante