views:

40

answers:

1

I would like to enumerate all the URLs in a text string, for example:

text = "fasòls http://george.it sdafsda"

For each URL found, I want to invoke a function method(...) that transforms the string.

Right now I'm using a method like this:

msg = ""
for i in text.split
  if (i =~ URI::regexp).nil?
        msg += " " + i
      else 
         msg+= " " + method(i)
  end
end
text = msg

This works, but it's slow for long strings. How can I speed this up?

+1  A: 

I think "gsub" is your friend here:

class UrlParser
  attr_accessor :text, :url_counter, :urls

  def initialize(text)
    @text = parse(text)
  end

  private
    def parse(text)
      @counter = 0
      @urls = []
      text.gsub(%r{(\A|\s+)(http://[^\s]+)}) do
        @urls << $2
        "#{$1}#{replace_url($2)}"
      end
    end

    def replace_url(url)
      @counter += 1
      "[#{@counter}]"
    end
end

parsed_url = UrlParser.new("one http://x.com/url two")
puts parsed_url.text
puts parsed_url.urls

If you really need extra fast parsing of long strings, you should build a ruby C extension with ragel.

Gaspard Bucher
if there aren't url text return nil
Luca Romagnoli
Ouups, sorry, there was a typo. It should be "gsub", not "gsub!" (and of course, you might need a more robust url regexp).
Gaspard Bucher