tags:

views:

71

answers:

3

((https?|ftp)\:\/\/|www.)(\S+[^.*])

I would like this expression to check for . in succession to each other. If it finds two or more periods back to back, the expression should fail. On the other hand, if it succeeds, I want it to match every character and/or symbol up until the first white space encountered.

In other words: www.yahoo..com should fail

On a related note: I realize that this expression is very basic in terms of judging valid URL structure. I have another "more intelligent" regular expression in place that precedes the one above. The purpose of the posted one is meant to check the validity of the URL that is passed from the initial regular expression via preg_match_all.

A: 

Using negative lookahead is an easy way if your engine supports it:

(?!.*\.\.)((https?|ftp)\:\/\/|www.)(\S+[^.*])

Otherwise, you have to be more specific:

^((https?|ftp)\:\/\/|www.)((\.[^.]|[^.\s])+[^.*])($|\s+)
Greg Bacon
This works well. I also found a bug in my initial expression that I'll need to fix. Thanks!
Mike
+3  A: 

You may awnt to check out FILTER_VALIDATE_URL with http://php.net/manual/en/book.filter.php instead of using Regex to validate your URLS.

Here's example usage:

$url = "http://www.example.com";

if(!filter_var($url, FILTER_VALIDATE_URL))
  {
  echo "URL is not valid";
  }
else
  {
  echo "URL is valid";
  }
Erik
I will look into this. Thanks!
Mike
For some reason, this seems to work the opposite of how it should for me. The valid URLs are labeled as invalid and vice versa. I'll do some more checking, but if it consistently does this, I'll just reverse the true or false returns.
Mike
Erik
I copied and pasted yours and then went to w3schools to look at the same function. They match up and I'm getting the same result.
Mike
Odd, That's where I grabbed it from, works fine for me =x and I've used the filter before.
Erik
A: 

You can do something like this:

((https?|ftp)\:\/\/|www.)((?:[\w\-]+\.)*[\w\-]+)

This will not yet check for valid URLs, even if you skip double dots. I'd advise not to use regex if the language you're using (PHP?) has other means of validating an URL.

The RFC states the following:

; URL schemeparts for ip based protocols:
ip-schemepart = "//" login [ "/" urlpath ]
login = [ user [ ":" password ] "@" ] hostport
hostport = host [ ":" port ]
host = hostname | hostnumber
hostname = *[ domainlabel "." ] toplabel
domainlabel = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit
toplabel = alpha | alpha *[ alphadigit | "-" ] alphadigit
alphadigit = alpha | digit
hostnumber = digits "." digits "." digits "." digits
port = digits
user = *[ uchar | ";" | "?" | "&" | "=" ]
password = *[ uchar | ";" | "?" | "&" | "=" ]
urlpath = *xchar ; depends on protocol see section 3.1

; HTTP
httpurl = "http://" hostport [ "/" hpath [ "?" search ]]
hpath = hsegment *[ "/" hsegment ]
hsegment = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
search = *[ uchar | ";" | ":" | "@" | "&" | "=" ]

; Miscellaneous definitions
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
"i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
"q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
"y" | "z"
hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
alpha = lowalpha | hialpha
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
"8" | "9"
safe = "$" | "-" | "_" | "." | "+"
extra = "!" | "*" | "'" | "(" | ")" | ","
national = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "`"
punctuation = "<" | ">" | "#" | "%" | <">
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "="
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
escape = "%" hex hex
unreserved = alpha | digit | safe | extra
uchar = unreserved | escape
xchar = unreserved | reserved | escape
digits = 1*digit
Lucero
I am using PHP. I am using regular expression because there's an additional step in my logic that I am creating where links local to my domain stay within the same window and links outside my domain open a new tab and sends the user to the location via the new tab.
Mike