tags:

views:

984

answers:

5

How do you produce a regex that matches only valid URI. The description for URIs can be found here: http://en.wikipedia.org/wiki/URI_scheme. It doesn't need to extract any parts, just test if a URI is valid.

(preferred format is .Net RegularExpression) (.Net Version 1.1)

  • Doesn't neet to check for a known protocol, just a valid one.

Current Solution:

^([a-zA-Z0-9+.-]+):(//([a-zA-Z0-9-._~!$&'()*+,;=:]*)@)?([a-zA-Z0-9-._~!$&'()*+,;=]+)(:(\\d*))?(/?[a-zA-Z0-9-._~!$&'()*+,;=:/]+)?(\\?[a-zA-Z0-9-._~!$&'()*+,;=:/?@]+)?(#[a-zA-Z0-9-._~!$&'()*+,;=:/?@]+)?$(:(\\d*))?(/?[a-zA-Z0-9-._~!$&'()*+,;=:/]+)?(\?[a-zA-Z0-9-._~!$&'()*+,;=:/?@]+)?(\#[a-zA-Z0-9-._~!$&'()*+,;=:/?@]+)?$
A: 

Are there some specific URIs you care about or are you trying to find a single regex that validates STD66?

I was going to point you to this regex for parsing a URI. You could then, in theory, check to see if all of the elements you care about are there.

But I think bdukes answer is better.

Mark Biek
+6  A: 

Does Uri.IsWellFormedUriString work for you?

bdukes
+1  A: 

This is going to get extremely complicated; I'm going to second bdukes's suggestion of using the System.Uri class and its methods to perform validation.

OwenP
A: 

@bdukes I'm using .Net 1.1

alumb
+2  A: 

This site looks promising: http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/

They propose following regex:

/^([a-z0-9+.-]+):(?://(?:((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*)@)?((?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*)(?::(\d*))?(/(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9A-F]{2})*)?|(/?(?:[a-z0-9-._~!$&'()*+,;=:@]|%[0-9A-F]{2})+(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9A-F]{2})*)?)(?:\?((?:[a-z0-9-._~!$&'()*+,;=:/?@]|%[0-9A-F]{2})*))?(?:#((?:[a-z0-9-._~!$&'()*+,;=:/?@]|%[0-9A-F]{2})*))?$/i
Daren Thomas