tags:

views:

116

answers:

6

I've the following string. How can I extract out the "somesite.com/2009/10/monit-on-ubuntu/" part from it using ruby regular expression?

http://linkto.com/to/1pyTZl/somesite.com/2009/10/monit-on-ubuntu/t

The common is, starts with "/to/some-alpha-num" and always ends with "/t"

A: 

Maybe with /\/to\/[^\/]*\/(.*)\/t/ :

"http://linkto.com/to/1pyTZl/somesite.com/2009/10/monit-on-ubuntu/t" =~ /\/to\/[^\/]*\/(.*)\/t/
puts $1

-> somesite.com/2009/10/monit-on-ubuntu

nacmartin
A: 
/to/\w+/(.*?)/t
Rubens Farias
+2  A: 

/\/to\/\w+\/(.*)\/t/i

A great resource is Rubular. It allows you to test your expression against inputs and see the matches.

Erik Nedwidek
Rubular tool is a good tool, I like it.
Steve Zhang
suffers from leaning toothpick syndrome. use `%r` to choose different delimiters
glenn jackman
+5  A: 

That string looks like it's actually not a string but a URI. So, let's treat it as one:

require 'uri'
uri = URI.parse(str)

Now, extracting the path component of the URI is a piece of cake:

path = uri.path

Now we have already greatly limited the amount of stuff that can go wrong with our own parsing. The only part of the URI we still have to deal with, is the path component.

A Regexp that matches the part you are interested in looks like this:

%r|/to/\w+/(.*/)t$|i

If we put all of that together, we end up with something like this:

require 'uri'

def URI.extract(uri)
  return parse(uri).path[%r|/to/\w+/(.*/)t$|i, 1]
end

require 'test/unit'
class TestUriExtract < Test::Unit::TestCase
  def test_that_the_path_gets_extracted_correctly
    uri  = 'http://linkto.com/to/1pyTZl/somesite.com/2009/10/monit-on-ubuntu/t'
    path = 'somesite.com/2009/10/monit-on-ubuntu/'
    assert_equal path, URI.extract(uri)
  end
end
Jörg W Mittag
upvoted for using behavior-driven answering :)
Adrian
+2  A: 

Answers so far a right, but you should make sure the trailing /t is really at the end of the string using the $ wildcard

regex = %r(/to/[^/]+/(.*)/t$)
'http://linkto.com/to/1pyTZl/somesite.com/2009/10/monit-on-ubuntu/t' =~ regex
puts $1
Adrian
Thanks. I updated my answer accordingly.
Jörg W Mittag
A: 
s = "http://linkto.com/to/1pyTZl/somesite.com/2009/10/monit-on-ubuntu/t"
puts s[/to\/.+?\/(.*)\/t$/, 1]
=> somesite.com/2009/10/monit-on-ubuntu
jhickner