views:

132

answers:

4

I know there are tonns of questions on here to validate a web address with something like this

/^[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i

The only problem is, not everybody uses the http:// or whatever comes before so i wanted to find a way to use the preg_match() but not checking for http as a must have but more of a doesn't really matter, i modified it to this but then it rejects the url it it does have http:// in it:

/^[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i

I was hoping more to validate it on these conditions

  • If it has http:// or www then just ignore this
  • If the .extension is longer than 9 then reject
  • If it contains no full stops

Anybody got an idea, thanks :)

A: 

Why not have a stage before the regexp to simply remove the http:// if present ? The same would apply to the www. That may make your life a bit easier.

Brian Agnew
A: 
/^(http\://|www\.)/

/^.+?\.\S{0,9}\./

/\./

Those should work for your bullet points?

dangerstat
+2  A: 

Can't you just use the built in filter_var function?

filter_var('example.com', FILTER_VALIDATE_URL);

Not sure about the nine chars extension limit, but I guess you could easily check this in an additional step.

Gordon
filter_var can be set up to filter and require the scheme or not (the http:// part, if you don't require it it will validate if you have it there or not). People love to reinvent the wheel. And they usually make it square.
Erik
A: 

not everybody uses the http://

They should. Without a scheme it simply isn't a URL, and omitting it can cause weird problems. For example:

www.example.com:8080/file.txt

This is a valid URL with the non-existant scheme www.example.com:.

If you are sure that the normal scheme should be http:, you could try automatically appending http:// to ‘fix up’ any URL that doesn't begin with https?:, before validation. But you shouldn't allow/keep/return schemeless URLs over the longer term.

Incidentally the current regex you are using is a long way from accurate according to the official URI syntax (see RFC 3986). It will disallow many valid URI characters, not to mention Unicode characters in IRI. If you want a proper validation you should use a real URL-parser; if you just want a quick check for obvious problems you should use something much more permissive. For example just checking for the absence of categorically-invalid characters like space and ".

bobince