tags:

views:

1137

answers:

1

Hello!

I want to extract links of a text using RegExp. There is no HTML code in the text so I can't search for the tags "<a...". How can I find the links, though?

Example:

"Please go to http://www.example.org/page1.html and click on ..."

I want to extract the text:

"http://www.example.org/page1.html"

As far as I know, a URL can contain the following characters:

a-z A-Z 0-9 /.#?=&+,@-_~

I hope you can help me. Thanks in advance!

The URL must start with "http" and before that, the must be a space or the beginning of the text.

+3  A: 

Already answered:

http://stackoverflow.com/questions/6173/regular-expression-for-parsing-links-from-a-webpage

Codebrain
Thanks! I used the search but I didn't find that. The accepted answer doesn't help me. But Jeff Atwood's answer does:"\b(https?|ftp|file)://[-A-Z0-9+
Do you mean it returns the dot as part of the matched text? I doesn't do that when I try it.
Alan Moore
Yes, it does. Try "Go to http://www.example.org." and you'll see.