ansaurus

Question

Regex to validate URL - Not checking for HTTP?

Answer 1

A:

Why not have a stage before the regexp to simply remove the http:// if present ? The same would apply to the www. That may make your life a bit easier.

Brian Agnew 2010-02-14 09:41:52

Answer 2

A:

/^(http\://|www\.)/

/^.+?\.\S{0,9}\./

/\./

Those should work for your bullet points?

dangerstat 2010-02-14 09:42:16

Answer 3

+2 A:

Can't you just use the built in filter_var function?

filter_var('example.com', FILTER_VALIDATE_URL);

Not sure about the nine chars extension limit, but I guess you could easily check this in an additional step.

Gordon 2010-02-14 10:04:11

filter_var can be set up to filter and require the scheme or not (the http:// part, if you don't require it it will validate if you have it there or not). People love to reinvent the wheel. And they usually make it square.

Erik 2010-02-14 10:18:18

Answer 4

A:

not everybody uses the http://

They should. Without a scheme it simply isn't a URL, and omitting it can cause weird problems. For example:

www.example.com:8080/file.txt

This is a valid URL with the non-existant scheme www.example.com:.

If you are sure that the normal scheme should be http:, you could try automatically appending http:// to ‘fix up’ any URL that doesn't begin with https?:, before validation. But you shouldn't allow/keep/return schemeless URLs over the longer term.

Incidentally the current regex you are using is a long way from accurate according to the official URI syntax (see RFC 3986). It will disallow many valid URI characters, not to mention Unicode characters in IRI. If you want a proper validation you should use a real URL-parser; if you just want a quick check for obvious problems you should use something much more permissive. For example just checking for the absence of categorically-invalid characters like space and ".

bobince 2010-02-14 10:52:15

ansaurus

tags:

views:

answers:

Regex to validate URL - Not checking for HTTP?

related questions