Can a single regex be used to valdate urls and match all the parts, I have been working on one and what I have come up with so far is:
(?:(?P<scheme>[a-z]*?)://)?(?:(?P<username>.*?):?(?P<password>.*?)?@)?(?P<hostname>.*?)/(?:(?:(?P<path>.*?)\?)?(?P<file>.*?\.[a-z]{1,6})?(?:(?:(?P<query>.*?)#?)?(?P<fragment>.*?)?)?)?
however this does not work, it should match all of the following examples:
http://username:[email protected]/path?arg=value#anchor
http://www.domain.com/
http://www.doamin.co.uk/
http://www.yahoo.com/
http://www.google.au/
https://username:[email protected]/
ftp://user:[email protected]/path/
https://www.blah1.subdoamin.doamin.tld/
domain.tld/#anchor
doamin.tld/?query=123
domain.co.uk/
domain.tld
http://www.domain.tld/index.php?var1=blah
http://www.domain.tld/path/to/index.ext
mailto://[email protected]
and provide a named capture for all the components:
scheme eg. http https ftp ftps callto mailto and any other one not listed
username
password
hostname including subdomains, domainand tld
path eg /images/profile/
filename eg file.ext
query string eg. ?foo=bar&bar=foo
fragment eg. #anchor
With the hostname as the only mandatory field.
We can assume that this is coming from a form specificly asking for an url, and is not going to be used to find links in text.