I need to match all valid URLs except:
http://www.w3.org
http://w3.org/foo
http://www.tempuri.org/foo
Generally, all URLs except certain domains.
Here is what I have so far:
https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?
will match URLs that are close enough to my needs (but in no way all valid URLs!) (thanks, http://snipplr.com/view/2371/regex-regular-expression-to-match-a-url/!)
https?://www\.(?!tempuri|w3)\S*
will match all URLs with www.
, but not in the tempuri
or w3
domain.
And I really want
https?://([-\w\.]+)(?!tempuri|w3)\S*
to work, but afaick, it seems to select all http://
strings.
Gah, I should just do this in something higher up the Chomsky hierarchy!