views:

50

answers:

3

Say a user submits this comment on a blog:

@SO - Great community, but we've also seen some great communities at Stack Overflow. At the same time Google's Gmail (http://gmail.com) is a great example of a community with endless bounds. I'm just wondering if anyone will really go toe-to-toe with something like http://www.twitter.com. What do you think?

Note: the 3rd url was actually posted as plain text, but SO converted it to a hyperlink.

Anyways, the total url and hyperlink count should be 3.

So, from a Ruby and/or Ruby on Rails perspective: How to count the number of occurences of urls and hyperlinks in a Ruby string?

A: 

The easiest way is to scan for "http" pattern, but really it can be more complicated, because sometimes urls haven't got "http://" at the beggining

string = "@SO - Great community, but we've also seen some great communities at <a href='http://blabla'&gt;Stack Overflow</a>. At the same time Google's Gmail (http://gmail.com) is a great example of a community with endless bounds. I'm just wondering if anyone will really go toe-to-toe with something like http://www.twitter.com. What do you think?"
string.scan(/http/).size #=> 3
fl00r
A: 

Using regular expressions is a good way. Here is an example on how to do that:

yourpost.each do |yourword|
     if yourword =~ /^(((ht|f)tps?\:\/\/)|~/|/)?([a-zA-Z]{1}([\w\-]+\.)+([\w]{2,5})(:[\d]{1,5})?)/?(\w+\.[\w]{3,4})?((\?\w+=\w+)?(&\w+=\w+)*)?/
            puts %Q!We found #{$&} an URL in #{$1}!
    end
end

See this post for further discussion on regular expressions matching URLs.

Pablo Santa Cruz
@Pablo - Will this method properly count "www.google.com"?
Jack Benning
mmmm... not sure. check out the link. it has a lot of discussion on URL searching with regexps.
Pablo Santa Cruz
+1  A: 

This is pretty easy, albeit relatively naive:

string.count("http://")

Of course, it won't pick up links without a leading "http://", but that might be a reasonable assumption.

Toby Hede
OUPS: `string.count("http://")` #=> 54
fl00r
@Toby Thanks for the contribution, but you are correct in that your method is a bit naive. What about the "https://" case? Or a plain "www.*" case?
Jack Benning