tags:

views:

443

answers:

3
A: 

What is wrong with web site like RegExLib.com ("url" section) ?

You should find what you need and test it for yourself there.

Anyway, this regex validate what you wants and exclude what you do not want.

(?ms)^(https?|ftp|telnet):\/\/((?:(?:(?=[^\r\n]*@)\w|-)+(?:(?::)(?:\w|-)+)?)?)@?((?:(?:(?:\w|-)+)\.)+(?:\w|-)+)(\:\d+)?((?:(?:/(?:\w|-)+(?:\.(?:\w|-)+)?)+)?)((?:\?(?:(?:\w|-)+\=(?:\w|[\.\-\*\:\+\#])*\&?)+)*)$

with:

  • group 1: protocol
  • group 2: username[:password]
  • group 3: domain (www.xxx)
  • group 4: empty
  • group 5: address (XX/yyy/zzzz)
  • group 6: parameters (?key1a=value1a&key2a=value2a?key1b=value1b&key2b=value2b...)
VonC
A: 

Here you go:

\^(https?|ftp|telnet):\/\/((?:[a-z0-9@:.-]|%[0-9A-F]{2}){3,})(?::(\d+))?((?:\/(?:[a-z0-9-._~!$&'()*+,;=:@]|%[0-9A-F]{2})*)*)(?:\?((?:[a-z0-9-._~!$&'()*+,;=:\/?@]|%[0-9A-F]{2})*))?(?:#((?:[a-z0-9-._~!$&'()*+,;=:\/?@]|%[0-9A-F]{2})*))?$/i

This was based on code found here: http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/ with several edits.

You can test it on this page: http://regexpal.com/ (paste the regex as:

^(https?|ftp|telnet):\/\/((?:[a-z0-9@:.-]|%[0-9A-F]{2}){3,})(?::(\d+))?((?:\/(?:[a-z0-9-._~!$&'()*+,;=:@]|%[0-9A-F]{2})*)*)(?:\?((?:[a-z0-9-._~!$&'()*+,;=:\/?@]|%[0-9A-F]{2})*))?(?:#((?:[a-z0-9-._~!$&'()*+,;=:\/?@]|%[0-9A-F]{2})*))?

and select "case insensitive")

Good luck!

Gdeglin
thanks. but this also matches http://example
@friex - please notice that http://example is perfectly valid url.
depesz
@depesz yeah I know but I don't need it to validate
Hmm, so you need to not only validate whether the URL is valid, but whether the host is a fully-qualified domain name (FQDN)? Would it be possible to use the provided regex in combination with a name lookup to validate the host? Or would it be enough to just require a dot somewhere in the host?
Jeff
A: 

This is not really a task for regular expression.

I mean - it's possible to write a regexp based, fully working validator, but it has more or less the same sense as writing regexp to validate an email address.

I don't know what language you're using, but I would guess that in most modern languages there should be library/module to parse and validate urls.

depesz