tags:

views:

65

answers:

2

Hi, I have seen many regular expressions for Url validation. In my case I want the Url to be simpler, so the regex should be tighter:

Valid Url prefixes look like:

  • http[s]://[www.]addressOrIp[.something]/PageName.aspx[?]

This describe a prefix. I will be appending ?x=a&y=b&z=c later. I just want to check if the web page is live before accessing it, but even before that I want to make sure that it is properly configured. I want to treat bad url and host is down conditions differently, although when in doubt, I'd rather give a host is down message, because that is an ultimate test anyway. Hopefully that makes sense. I guess what I am trying to say - the regex does not need be too aggressive, I just want it to cover say 95% of the cases.

This is C# - centric, so Perl regex extensions are not helpful to me; let's stick to the lowest common denominator.

Thanks!

+1  A: 

Use System.Uri instead.

Then you will be able to work with your URL parts such as Host, Scheme, PathAndQuery etc and check the necessary conditions.

Alex
+3  A: 

You should use the Uri class:

Uri uri;

if (!Uri.TryCreate(str, UriKind.Absolute, out uri))
    //Bad bad bad!!!
if (uri.Scheme != "http" && uri.Scheme != "https")
    //Bad bad bad!!!
if (uri.Host.IndexOf('.') <0)
    //Bad bad bad!!!
SLaks
Cool! I suppose I can also do `IndexOf(".aspx")`. Was that the purpose of looking for a dot? For instance: `http://www.somehost.com/MyPage.aspx?a=b` contains two dots in it. In this very example, which part will `uri.Host` cover?
Hamish Grubijan
`Uri.Host` will be `www.somehost.com`. The `IndexOf` call forces a TLD. You can add whatever checks you want; see the documentation.
SLaks
Ok, thank you !
Hamish Grubijan