tags:

views:

45

answers:

2

How to extract all urls from a text in ruby?

I tried some libs but it fails in some cases, whats the best way?

+2  A: 

You can use regex and .scan()

string.scan(/(https?:\/\/([-\w\.]+)+(:\d+)?(\/([\w\/_\.]*(\?\S+)?)?)?)/)

You can get started with that regex and adjust it according to your needs.

NullUserException
+1  A: 

What cases are failing?

According to the library regexpert, you can use

regexp = /(^$)|(^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$)/ix

and then perform a scan on the text.

EDIT: Seems like the regexp supports the empty string. Just remove the initial (^$) and you're done

Chubas
Interesting how this regex fails when the URL is an IP address
NullUserException
Yep. I actually voted up on your answer because of the "and adjust it according to your needs". It also fails when present a user@password, or using other than https?, or any other weird situation. You probably wouldn't want to read http://tools.ietf.org/html/rfc3986 to get started -_-
Chubas
It fails as above. I am asking here just why i am unable to "and adjust it according to your needs".
tapioco123