tags:

views:

166

answers:

5

I have a ruby app parsing a bunch of URLs from strings:

@text = "a string with a url http://example.com"

@text.split.grep(/http[s]?:\/\/\w/)

@text[0] = "http://example.com"

This works fine ^^

But sometimes the URLs have text before the HTTP:// for example

@text = "What's a spacebar? ...http://example.com"

@text[0] = "...http://example.com"

Is there a regex that can select just the text before "http://" in a string so I can strip it out?

+2  A: 
.*(?=http://)
chaos
A: 

or you could combine the two.

.*(?=(f|ht)tp[s]://)
Reactor5
A: 

Just search for http://, then remove the parts of the string before that (as the =~ returns the offset into the string)

Pod
+5  A: 

Spliting and then grepping is an odd way to do this. Why don't you just use String#scan:

@text = "a string with a url http://example.com"
urls = @text.scan(/http[s]?:\/\/\S+/)
url[0]  # => "http://example.com"
Pesto
Thanks, this solved my problem - it ignores everything preceding the matching text.
dMix
+6  A: 

Perhaps a nicer way to achieve the same result is to use the URI standard library.

require 'uri'
text = "a string with a url http://example.com and another URL here:http://2.example.com and this here"
URI.extract(text, ['http', 'https'])
# => ["http://example.com", "http://2.example.com"]

Documentation: URI.extract

Olly