tags:

views:

118

answers:

4

The regular expression posted below is used to pick up URLs, including ones in the format such as example.com. However, I want it only to pick up on URLs that have a www. or http, https, etc. in the front. In other words, it should pick up www.example.com. It should not pick up example.com.

((((ht|f)tp(s?))\://)?((www.|[a-zA-Z])([a-zA-Z0-9\-]+\.)([a-zA-Z]{2,8}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&%\$#\=~_\-]+))*)
A: 

Hmmm try

(((((ht|f)tp(s?))\://)|(www\.))((|[a-zA-Z])([a-zA-Z0-9-]+.)([a-zA-Z]{2,8}))(\:[0-9]+)*(/($|[a-zA-Z0-9.\,\;\?\'\+&%\$#\=~_-]+))*)

EDIT: Yeah, I didn't really test that one. Ok, I didn't test this one either but I looked at it REALLY carefully ;)

(((((ht|f)tp(s?))\://)|(www\.))(([a-zA-Z0-9-]+.)?([a-zA-Z0-9]+\.)([a-zA-Z]{2,8}))(\:[0-9]+)*(/($|[a-zA-Z0-9.\,\;\?\'\+&%\$#\=~_-]+))*)

You should look into a good regex tester. I usually use Expresso but there are many others out there.

FrustratedWithFormsDesigner
This one seems to cut off randomly. For example, when trying http://www.yahoo.com, it cuts off the .com. It also happens for other instances where http:// is used so it's not always at the .com.
Mike
@Mike: New expression, try it out.
FrustratedWithFormsDesigner
Thanks! It's working a lot better. I'll do more thorough testing, but all previous issues seem to have been resolved.
Mike
A: 

I modified your expression:

((((ht|f)tp(s?))\://)?((www\.)([a-zA-Z0-9-]+\.)([a-zA-Z]{2,8}))(\:[0-9]+)*(/($|[a-zA-Z0-9.\,\;\?\'\+&%\$#\=~_-]+))*)

A pretty good website to check your expressions here: http://gskinner.com/RegExr/

Philipp G
This worked exactly the way I wanted. Thanks a lot!
Mike
Sorry. I replied too quickly without thoroughly testing. It does check for the www. etc. However, it no longer picks up URLs with a subdomain.
Mike
+1  A: 

Validate that the URI is well-formed with a regexp--use the one out of RFC 3986. Validate that it is plausible with code. Trying to combine the check for well-formed and plausible into one regexp is too difficult to get right. See: Need a regex to validating a Url...

Wayne Conrad
Good point, probably easier to reject special cases after verifying the input is well formed.
FrustratedWithFormsDesigner
I will give it a shot.
Mike
A: 

Here you go:

\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.])(?:[^\s()<>]+|\([^\s()<>]+\))+(?:\([^\s()<>]+\)|[^`!()\[\]{};:'".,<>?«»“”‘’\s]))

It's the revised Liberal URL Regex from Daring Fireball.

Alix Axel
Mike
@Mike: The regex I provided doesn't match `asfjkljswww.yahoo.com`, check again.
Alix Axel
You're correct. I must've made a mistake when I copied it over. This works very well. Thank you for your help!
Mike