tags:

views:

130

answers:

4

How to load a Web page and search for a word in Ruby??

+2  A: 

Here's a complete solution:

require 'open-uri'
if open('http://example.com/').read =~ /searchword/
  # do something
end
Peter
+1  A: 

You can also use mechanize gem, something similar to this.

require 'rubygems'
require 'mechanize'

mech = WWW::Mechanize.new.get('http://example.com') do |page|

        if page.body =~ /mysearchregex/

                puts "found it"
        end
end
ttvd
+2  A: 

I suggest using Nokogiri or hpricot to open and parse HTML documents. If you need something simple that doesn't require parsing the HTML, you can just use the open-uri library built in to most ruby distributions. If need something more complex for posting forms (or logging in), you can elect to use Mechanize.

Nokogiri is probably the preferred solution post _why, but both are about as simple as this:

require 'nokogiri'
require 'open-uri'
doc = Nokogiri(open("http://www.example.com"))
if doc.inner_text.match(/someword/)
  puts "got it"
end

Both also allow you to search using xpath-like queries or CSS selectors, which allows you to grab items out of all divs with class=foo, for example.

Fortunately, it's not that big of a leap to move between open-uri, nokogiri and mechanize, so use the first one that meets your needs, and revise your code once you realize you need the capabilities of one of the other libraries.

JasonTrue
+1  A: 

For something simple like this I would prefer to write a couple of lines of code instead of using a full blown gem. Here is what I will do:

require 'net/http'

# let's take the url of this page
uri = 'http://stackoverflow.com/questions/1878891/how-to-load-a-web-page-and-search-for-a-word-in-ruby' 

response = Net::HTTP.get_response(URI.parse(uri)) # => #<Net::HTTPOK 200 OK readbody=true>

# match the word Ruby
/Ruby/.match(response.body) # => #<MatchData "Ruby">

I can go to the path of using a gem if I need to do more than this and I need to implement some algorithm for that which is already being done in one of the gems

nas
Not an invalid answer to this question, but you may want to read the following: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
Andrew Grimm
that's useful, thanks
nas