Validating URIs for RFC 3986 is fairly simple. You can use a regular expression like:
/^ # Start at the beginning of the text
([a-z][a-z0-9\*\-\.]*):\/\/ # The scheme
(?: # Userinfo (optional)
(?:(?:[\w\.\-\+!$&'\(\)*\+,;=]|%[0-9a-f]{2})+:)*
(?:[\w\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})+@
)?
(?: # The domain
(?:[a-z0-9\-\.]|%[0-9a-f]{2})+ # Domain name or IPv4
|(?:\[(?:[0-9a-f]{0,4}:)*(?:[0-9a-f]{0,4})\]) # or IPv6
)
(?::[0-9]+)? # Server port number (optional)
(?:[\/|\?]
(?:[\w#!:\.\?\+=&@!$'~*,;\/\(\)\[\]\-]|%[0-9a-f]{2}) # The path (optional)
*)?
$/xi
But, this doesn't work for International characters like those found in International Domain Names. For example,http://例え.テスト/メインページ.
Using something like
filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED);
doesn't work for these either. The issue has to do with the characters used.
Is there a good way to validate URIs in PHP?