How can I write a RE which validates the URLs without the scheme:
Pass:
- www.example.com
- example.com
Fail:
How can I write a RE which validates the URLs without the scheme:
Pass:
Fail:
My guess is
/^[\p{Alnum}-]+(\.[\p{Alnum}-]+)+$/
In more primitive RE syntax that would be
/^[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)+$/
Or even more primitive still:
/^[0-9A-Za-z-][0-9A-Za-z-]*\.[0-9A-Za-z-][0-9A-Za-z-]*(\.[0-9A-Za-z-][0-9A-Za-z-]*)*$/
URL syntax is quite complex, you need to narrow it down a bit. You can match anything.ext, if that is enough:
^[a-zA-Z0-9.]+\.[a-zA-Z]{2,4}$
^[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(/.*)?$
":8080"
)Thoughts:
$
"If your regex flavor supports it, you can shorten the above to:
^[A-Za-z\d][\w.-]+(:\d+)?(/.*)?$
Be aware that \w
may include Unicode characters in some regex flavors. Also, \w
includes the underscore, which is invalid in host names. An explicit approach like the first one would be safer.
If you're trying to do this for some real code, find the URL parsing library for your language and use that. If you don't want to use it, look inside to see what it does.
The thing that you are calling "resource" is known as a "scheme". It's documented in RFC 1738 which says:
[2.1] ... In general, URLs are written as follows:
<scheme>:<scheme-specific-part>
A URL contains the name of the scheme being used (<scheme>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.
And, later in the BNF,
scheme = 1*[ lowalpha | digit | "+" | "-" | "." ]
So, if a scheme is there, you can match it with:
/^[a-z0-9+.-]+:/i
If that matches, you have what the URL syntax considers a scheme and your validation fails. If you have strings with port numbers, like www.example.com:80, then things get messy. In practice, I haven't dealt with schemes with -
or .
, so you might add a real world fudge to get around that until you decide to use a proper library.
Anything beyond that, like checking for existing and reachable domains and so on, is better left to a library that's already figured it all out.
Thanks guys, I think I have a Python and a PHP solution. Here they are:
Python Solution:
import re
url = 'http://www.foo.com'
p = re.compile(r'^(?!http(s)?://$)[A-Za-z][A-Za-z0-9.-]+(:\d+)?(/.*)?$')
m = p.search(url)
print m # m returns _sre.SRE_Match if url is valid, otherwise None
PHP Solution:
$url = 'http://www.foo.com';
preg_match('/^(?!http(s)?:\/\/$)[A-Za-z][A-Za-z0-9\.\-]+(:\d+)?(\/\.*)?$/', $url);