tags:

views:

438

answers:

6

I'm writing a .NET 3.5 app and using URI.IsWellFormedUriString(string uriString, UriKind uriKind) to verify user-inputted URIs; using UriKind.Absolute. I was just playing with the application and I'm a bit worried and confused as to why something like:

http://ddd

is a valid URI? What gives? I know it's because it's part of the RFC, but why is it valid in the first place?

The only time I've ever seen URIs like that is for corporate, internal Intranets like

http://companyinet

or

http://localhost (which is very popular, but also a special case)

I do not want to have to use my own regular expression as there are so many varying URI regexs. However, I do not really want users entering URIs like that that aren't publically accessible.

Any idea or thoughts? Thanks.

+18  A: 

HTTP://ddd is valid because it does point to a Unique Resource. In this case, it points to the webserver (hopefully) of the computer 'ddd' on the local network.

URI is unique resource identifier, not unique world wide web resource identifier. file:///blah.txt is also a valid URI

Malfist
+2  A: 

It is a valid URI because it follows the syntax of URIs: it has a scheme, and a scheme-specific component ('http' being the scheme', ':' separating the two, and '//ddd' is the scheme-specific part.)

In the case of a HTTP URI, it also follows the syntax for those, with 'ddd' being a valid host name.

The syntax of URIs is defined in http://www.ietf.org/rfc/rfc2396.txt

ankon
+14  A: 

That's because it IS a perfectly valid URI, as you mention.

I'd alter your strategy slightly... If you want URIs that are not only valid (as in well-formed), but also valid, in the sense that they actually point to a site, you'll have to add one more step.

After the string validation, issue a HEAD request to ping the URL. If it returns a 2xy status code, you're probably good to go. This will work in most situations, but is not without caveats and exceptions.

jason
Yeah, speaking of caveats, aren't you forgetting the redirects (3xy)?
shylent
I was afraid of this. Don't suppose you could expand upon the "caveats and exceptions"? The only one I can think of is if the user is using a local webserver, in which case the server will respond with a 2XX status.
Chad
HEAD and GET requests typically follow redirects before actually returning. I left out any specifics because every implementation, depending on how you access the URL would differ wildly.
jason
Ah ok, I was just wondering in case I was misunderstanding something. In any case, your answer hits the nail on the head, - to test if the url is truly valid (points to something), the only definitive way is to actually follow it!
shylent
Ah right. I am using the .NET HttpWebRequest/Response, which has an option to follow redirects. In my application this is set to default, which I believe is true.
Chad
You may want to cache at least common results. Google.com does not need to be checked every time, ect. Perhaps premature but something to keep an eye on and fun to code up.
jms
+6  A: 

Because it conforms to RFC 1738 (as well as the URI specification of RFC 2396).

The RFC makes specific allowances for resource paths that only consist of a scheme and a scheme specific element - in this case a hostname. As long as it identifies a unique resources and conforms to the syntax of URIs it is valid.

LBushkin
+3  A: 

You answered the question yourself. It's a "valid" (well-formed) URI by the RFC spec's definition ipso facto.

To help solve your required task, do some addition checks in your regex for the one or more dots (don't forget to escape them!) or possibly try to hit the resource itself to see if it actually responds.

Mike Atlas
+1  A: 

Here is a simple experiment to see why that URL is valid:

0) use the dig or ping utility to get the IP address of google.com. I got: 74.125.53.100

1) Edit your /etc/hosts file (on Windows it is something like C:\Windows\system32\drivers\etc\hosts, and you might need to create it). In your hosts file, add a line like this:

74.125.53.100 ddd

Don't forget to save your edits.

2) In a web browser, go to this URL: http://ddd

3) You just accessed Google using the URL. That's why it's a valid URL.

steveha