tags:

views:

84

answers:

3

I have the following scenario:

URL u1 = new URL("http://www.yahoo.com/");
URL u2 = new URL("http://www.yahoo.com");

if (u1.equals(u2)) {
    System.out.println("yes");
}
if (u1.toURI().equals(u2.toURI())) {
    System.out.println("uri equality");
}
if (u1.toExternalForm().equals(u2.toExternalForm())) {
    System.out.println("external form equality");
}
if (u1.toURI().normalize().equals(u2.toURI().normalize())) {
    System.out.println("uri normalized equality");
}

None of these checks are succeeding. Only the path differs: u1 has a path of "/" while u2 has a path of "". Are these URLs pointing to the same resource and is there a way for me to check such a thing without opening a connection? Am I misunderstanding something fundamental about URLs?

EDIT I should state that a non hacky check is desired. Is it reasonable to say that empty path == / ? I was hoping to not have this kind of code

+1  A: 

From the 2007 JavaOne :

The second puzzle, aptly titled "More Joys of Sets" has the user create HashMap keys that consist or several URL objects. Again, most of the audience was unable to guess the correct answer.

The important thing the audience learned here is that the URL object's equals() method is, in effect, broken. In this case, two URL objects are equal if they resolve to the same IP address and port, not just if they have equal strings. However, Bloch and Pugh point out an even more severe Achilles' Heel: the equality behavior differs depending on if you're connected to the network, where virtual addresses can resolve to the same host, or if you're not on the net, where the resolve is a blocking operation. So, as far as lessons learned, they recommend:

Don't use URL; use URI instead. URI makes no attempt to compare addresses or ports. In addition, don't use URL as a Set element or a Map key.
For API designers, the equals() method should not depend on the environment. For example, in this case, equality should not change if a computer is connected to the Internet versus standalone.


From the URI equals documentation :

For two hierarchical URIs to be considered equal, their paths must be equal and their queries must either both be undefined or else be equal.

In your case, the two path are different. one is "/" the other is "".


According to the URI RFC §6.2.3:

Implementations may use scheme-specific rules, at further processing cost, to reduce the probability of false negatives. For example, because the "http" scheme makes use of an authority component, has a default port of "80", and defines an empty path to be equivalent to "/", the following four URIs are equivalent:

 http://example.com
 http://example.com/
 http://example.com:/
 http://example.com:80/

It seems that this implementation doesn't use scheme-specific rules.


Resources :

Colin Hebert
...this doesn't answer the question at all.
Zarel
Interesting.. but then the toURI() test would succeed if they were in fact equal.
SB
@SB, updated with more RFC and more documentation :)
Colin Hebert
@Colin, now *this* answers the question. :)
Zarel
Thanks! Out of curiosity, why did you make it community wiki?
SB
@SB, I expected that someone could help me to fill the blanks. Well I did it all by myself :) @Zarel, not quite, I'm still looking for a way to do the equality check :)
Colin Hebert
@Colin - what do you think of the equality check I propose in my answer?
SB
@SB, even if it works in your case, it doesn't work with "http://example.com:/" or "http://example.com:80/" I think you could write a URIUtil with a static `equals(URI, URI)` based on the JDK source code but with some additional rules. http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/net/URI.java#URI.equals%28java.lang.Object%29
Colin Hebert
Ah I see. You are going for the broad case. In my code, I'm using the same authority as the passed in URL and just fixing the path.
SB
A: 

Strictly speaking they are not equal. The optional trailing slash (/) is only a common usage but not a must. You could display different pages for

http://www.yahoo.com/foo/

and for

http://www.yahoo.com/foo

It's even possible for the one you provided I believe the HTTP header could skip that slash.

Wernight
Right but can there be logic that changes www.yahoo.com and www.yahoo.com/ ?
SB
`example.com/foo/` and `example.com/foo` are different, yes, but `example.com` and `example.com/` are exactly the same.
Zarel
A: 

Consider using Apache Commons URI class: http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/URI.html

Andriy Sholokh