views:

72

answers:

4

Hi everyone,

I need an Uri validation method. So, strings like:

"http://www.google.com", "www.google.com", "google.com"

..must be validated as Uri's. And also normal strings like "google" must not be validated as Uri's. To do this checking, I use two methods: UriBuilder and Uri.TryCreate().

The problem with UriBuilder is that any string I give it, it returns an Uri out of it. When I pass a normal string in its constructor, it gives it a scheme and returns "http://google/" which is not the behavior I want.

The problem with Uri.TryCreate() is that, while it works ok with "http://www.google.com" and "www.google.com", when I give it "google.com" it does not validate is as an Uri.

I thought about doing checks on the string, if it starts with http:// or www, send the string to the UriBuilder class, but this does not help with "google.com" which also must be an Uri.

How can I validate stuff like "google.com" as an Uri, but not "google"? Checking the end of the string for .com, .net , .org doesn't seem flexible.

Thanks in advance.

Best regards,

Andrei

+1  A: 
public static bool IsValidUri(string uriString)
{
    Uri uri;
    if (!uriString.Contains("://")) uriString = "http://" + uriString;
    if (Uri.TryCreate(uriString, UriKind.RelativeOrAbsolute, out uri))
    {
        if (Dns.GetHostAddresses(uri.DnsSafeHost).Length > 0)
        {
            return true;
        }
    }
    return false;
}
jojaba
The protocol can be [several other things](http://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Examples_of_absolute_URIs) other than HTTP.
slugster
@slugster: That's why he checks if it already has a protocol... he only sets it to http if it doesn't.. which is by far the most common and is pretty safe to default to.
Mark
Thank you for your code. However this code builds an Uri from a single word - if I pass "google" i get in return "http://google/" which is not what I need. Also I would like to avoid building the code logic on try/catch constructs.
Andrei
@andrei: Sounds like you need both a Uri Validator and a DNS checker. I updated the function to include both, and removed the try/catch.
jojaba
Thank you for the updated code. By the way you can replace strings "://" and "http" with their object correspondents Uri.SchemeDelimiter and Uri.UriSchemeHttp. The problem with this code is that when the string is normal like "google", it throws a SocketException. So I have made a variation of your code that handles it:
Andrei
I think adding the protocol string should be out of this function's scope.
Zafer
+4  A: 

What you're looking for is Uri.IsWellFormedUriString. The following code returns true:

Uri.IsWellFormedUriString("google.com", UriKind.RelativeOrAbsolute)

If you set UriKind to Absolute, it returns false:

Uri.IsWellFormedUriString("google.com", UriKind.Absolute)

EDIT: See here for UriKind enumeration.

  • RelativeOrAbsolute: The kind of the Uri is indeterminate.
  • Absolute: The Uri is an absolute Uri.
  • Relative: The Uri is a relative Uri.

From MSDN documentation:

Absolute URIs are characterized by a complete reference to the resource (example: http://www.contoso.com/index.html), while a relative Uri depends on a previously defined base URI (example: /index.html).

Also, see here for Uri.IsWellFormedUriString. This method works in accordance with RFC 2396 and RFC 2732.

If you look at RFC 2396, you'll see that google.com is not a valid URI. In fact www.google.com isn't neither. But under F. Abbreviated URLs, this situtation is explained in detail as follows:

The URL syntax was designed for unambiguous reference to network resources and extensibility via the URL scheme. However, as URL identification and usage have become commonplace, traditional media (television, radio, newspapers, billboards, etc.) have increasingly used abbreviated URL references. That is, a reference consisting of only the authority and path portions of the identified resource, such as www.w3.org/Addressing/ or simply the DNS hostname on its own. Such references are primarily intended for human interpretation rather than machine, with the assumption that context-based heuristics are sufficient to complete the URL (e.g., most hostnames beginning with "www" are likely to have a URL prefix of "http://"). Although there is no standard set of heuristics for disambiguating abbreviated URL references, many client implementations allow them to be entered by the user and heuristically resolved. It should be noted that such heuristics may change over time, particularly when new URL schemes are introduced. Since an abbreviated URL has the same syntax as a relative URL path, abbreviated URL references cannot be used in contexts where relative URLs are expected. This limits the use of abbreviated URLs to places where there is no defined base URL, such as dialog boxes and off-line advertisements.

What I understand from that is, Uri.IsWellFormedUriString accepts strings that are in form of www.abc.com as valid URIs. But google.com is not accepted as an absolute URI whereas it's accepted as a relative URI because it conforms to relative path specification (paths can contain .).

Also, as a side note, if you want to use regular expression to parse a URI, you can read B. Parsing a URI Reference with a Regular Expression.

Zafer
@Zafer - thank you for your answer. This method is interesting, it does validate "google.com" which is great, however it validates a single word ("google") as a well formed uri as well, which I don't need. Helpful answer nonetheless
Andrei
@Andrei: I've updated my answer. The answer lies in RFC 2396.
Zafer
Thanks for this, I have further read about Uri.IsWellFormedUriString and I think I understand why it validates "google" as a valid Uri. So, what I need I guess, is a way to check if the end of the string has a .com, .net, ..etc attached to it. I am reluctant of using Regular Exp on this because they can have flaws, what if in the future someone invents a popular extension like ".zedo" for example, my regExp will not catch it since it will only handle known terminations (.net, .com etc).
Andrei
@Andrei: Regular expressions may get you in trouble in the future as you mentioned. So, sticking with standard methods provided by the framework is better. I suggest you insert http:// to the beginning of the string, if it doesn't start with it.So that http://google.com becomes an absolute URI.
Zafer
+1  A: 

use RegExp for this.

Sample code of validation URL

Regex RgxUrl = new Regex("(([a-zA-Z][0-9a-zA-Z+\\-\\.]*:)?/{0,2}[0-9a-zA-Z;/?:@&=+$\\.\\-_!~*'()%]+)?(#[0-9a-zA-Z;/?:@&=+$\\.\\-_!~*'()%]+)?");
    if (RgxUrl.IsMatch(<yourURLparameter>))
    {
      //url is valid
    }
    else
    {
      //url is not valid
    }
AEMLoviji
+1  A: 

this is a variant of the code from Jojaba to whom I thank for the DNS checker, that was what I needed. the only problem is that it uses a try catch in its logic which I was hoping to avoid.

        public static Uri StringToAbsoluteUri(string uriString) 
        {
        Uri resultUri = null;

        if (!uriString.Contains(Uri.SchemeDelimiter))
            uriString = Uri.UriSchemeHttp + Uri.SchemeDelimiter + uriString;

        if (Uri.TryCreate(uriString, UriKind.RelativeOrAbsolute, out resultUri))
        {
            try
            {
                IPAddress[] addressesOfHost = Dns.GetHostAddresses(resultUri.DnsSafeHost);
                if (addressesOfHost.Length > 0)
                {
                    return resultUri;
                }
            }
            catch (System.Net.Sockets.SocketException)
            {
                return null;
            }
        }   
        return resultUri;
        }
Andrei