tags:

views:

133

answers:

1

Validating URIs for RFC 3986 is fairly simple. You can use a regular expression like:

/^                                                     # Start at the beginning of the text
([a-z][a-z0-9\*\-\.]*):\/\/                            # The scheme
(?:                                                    # Userinfo (optional)                                              
  (?:(?:[\w\.\-\+!$&'\(\)*\+,;=]|%[0-9a-f]{2})+:)*
  (?:[\w\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})+@
)?
(?:                                                    # The domain
  (?:[a-z0-9\-\.]|%[0-9a-f]{2})+                       # Domain name or IPv4
  |(?:\[(?:[0-9a-f]{0,4}:)*(?:[0-9a-f]{0,4})\])        # or IPv6
)
(?::[0-9]+)?                                           # Server port number (optional)
(?:[\/|\?]
  (?:[\w#!:\.\?\+=&@!$'~*,;\/\(\)\[\]\-]|%[0-9a-f]{2}) # The path (optional) 
*)?
$/xi

But, this doesn't work for International characters like those found in International Domain Names. For example,http://例え.テスト/メインページ.

Using something like

filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED);

doesn't work for these either. The issue has to do with the characters used.

Is there a good way to validate URIs in PHP?

+1  A: 

With preg_match \pL will match any unicode letter. So replace the a-z with \pL. And 0-9 with \pN. See Regular Expression Details for more information.

grom