tags:

views:

203

answers:

3

Hi,

The following line of code gives an exception. Is this a bug in the framework? If not what approach could I take instead?

It seems to be the ":" (colon) that causes in the issue, however I do see such a URI working on production websites ok (i.e. seems to be a valid URI in the real world)

Uri relativeUri = new Uri("http://test.com/asdf").MakeRelativeUri(new Uri("http://test.com/xx:yy"));
// gives => System.UriFormatException: A relative URI cannot be created because the 
// 'uriString' parameter represents an absolute URI

Uri relativeUri = new Uri("http://test.com/asdf").MakeRelativeUri(new Uri("http://test.com/xxyy"));
// this works - removed the colon between the xx and yy

PS. Specifically can I ask given the above is the case, what .NET class/method could I use (noting I am parsing a HTML page from the web) to take (a) the page URI and (b) the relative string from a HTML HREF argument [e.g. would have been "/xx:yy" in this case] and return the valid URI that could be used to address that resource?

In other words how do I mimic the behavior of a browser that translates the HREF and the page URI to produce the URI it uses to go to that resource when you click on it.

+1  A: 

Colons play a special role in URLs - to denote a port for instance and are therefor 'reserved' (see here).

URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded

So, the colon should be escaped.

Shane C. Mason
thanks Shane - I've made the question more specific re what would help me out
Greg
A: 

If a colon is found it tries to parse the value that follows the colon as a port number and it will fail if you don't provide a valid port number. See here for an example of a similar issue and MSDN link for UriFormatException details.

Tanner
thanks Tanner - I've made the question more specific re what would help me out
Greg
+3  A: 

I consider it a bug.

RFC1738 says that : (amongst other characters) may be reserved for special meaning within a scheme. However the http scheme does not reserve it in the path part

Within the <path> and <searchpart> components, "/", ";", "?" are reserved.

(Not :.)

hsegment       = *[ uchar | ";" | ":" | "@" | "&" | "=" ]

So, http://test.com/xx:yy is a valid URI. The newer RFC3968 agrees:

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

However of course, relativised against http://test.com/asdf, the resultant xx:yy would be an absolute URI and not a valid relative URI:

path-noscheme = segment-nz-nc *( "/" segment )
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                ; non-zero-length segment without any colon ":"

So MakeRelativeUri is kind of right to report there's a problem, but really it should be fixing it automatically by encoding the : that is valid in an absolute URI to a %3A that is valid in the first segment of a relative URI.

I would generally try to avoid MakeRelativeUri in favour of root-relative URIs, which are easier to extract and don't have this problem (/xx:yy is OK).

bobince
thank bobince thats great - do you know of a direct .net method that gives the root-relative URI from a PageURI + HRefString? Just looking for one at the moment...or do you have to "do it yourself"?
Greg
actually I should probably start a new question for this and mark this one as finished...I'll do this
Greg
created this specific question at http://stackoverflow.com/questions/2144150/c-question-how-do-i-convert-a-pageuri-href-to-an-absolute-url-uri
Greg
@bobince: really great answer - especially the solid references.
ladenedge