tags:

views:

35

answers:

2

I want to extract URLs from a webpage these are just URLs by themselves not hyperlinks etc., they are just text. Some examples would be http://www.example.com, http://example.com, www.example.com etc. I am extremely new at regex so I have copy and pasted like 20 expressions online all failed to work. I don't know if I am doing it right or not. Any help would be really appreciated.

A: 

You're probably not escaping your .s. You need to use \. for each one.

Take a look at strfriend.com. It has a URL example, and represents it graphically.

The example it suggests is:

^((ht|f)tp(s?)\:\/\/|~/|/)?(\w+:\w+@)?([a-zA-Z]{1}([\w-]+.)+(\w{2,5}))(:\d{1,5})?((/?\w+/)+|/?)(\w+.\w{3,4})?((\?\w+=\w+)?(&\w+=\w+)*)?

Eric
kyle
Eric
A: 

I wrote a post on using Regex to locate links within a HTML page (the intent was to use JavaScript to open external links or links to documents such as PDF's etc in a popup window).

The final regex was: ^(?:[.\/]+)?(?:Assets|https?:\/\/(?!(?:www.)?integralist))

The full post is here: http://www.integralist.co.uk/javascript/regular-expression-to-open-external-links-in-popup-window/

The solution wont be perfect but might help point you in the right direction.

Mark

Mark McDonnell