views:

89

answers:

4

Hi, I am new to the world of Ruby and Rails.

I have seen rails cast 190 and I just started playing with it. I used selector gadget to find out the CSS and XPath

I have the following code..

require 'rubygems'  
require 'nokogiri'  
require 'open-uri'  

url = "http://www.telegraph.co.uk/sport/football/rss"  
doc = Nokogiri::HTML(open(url))  
doc.xpath('//a').each do |paragraph|
puts paragraph.text
end

When I extracted text from a normal HTML page with css, I could get the extracted text on the console.

But when I try to do the same either with CSS or XPath for the RSS Feed for the following URL mentioned in the code above, I dont get any output.

How do you extract text from RSS feeds??

I also have another silly question.

Is there a way to extract text from 2 different feeds and display it on the console

something like

url1 = "http://www.telegraph.co.uk/sport/football/rss"
url2 = "http://www.telegraph.co.uk/sport/cricket/rss"

Looking forward for your help and suggestions

Thank You

Gautam

A: 

You have these installed: libxml2 libxml2-dev libxslt libxslt-dev

Pragnesh Vaghela
A: 

Rss page is not HTML document, it is XML, so you should use Nokogiri::XML(open(url))

Then view source code of the rss page. There are no <a> elements.

All links in document are created with the <link> tag:

<link>http://www.telegraph.co.uk/sport/football/world-cup-2010/teams/france/7769203/France-2-Costa-Rica-1-match-report.html&lt;/link&gt; 

Links to each article are also duplicated as <guid> tag, because article's ID in RSS is it's URL.

<guid>http://www.telegraph.co.uk/sport/football/world-cup-2010/teams/france/7769203/France-2-Costa-Rica-1-match-report.html&lt;/guid&gt; 

So, if you need all links in document, use:

url = "http://www.telegraph.co.uk/sport/football/rss"  
doc = Nokogiri::XML(open(url))  
doc.xpath('//link').each do |paragraph|
  puts paragraph.text
end

If you need only links to articles, use doc.xpath('//guid')

As for the many feeds, just use loop

feeds = ["http://www.telegraph.co.uk/sport/football/rss", "http://www.telegraph.co.uk/sport/cricket/rss"]
feeds.each do |url|
  #and here goes code as before
end
Voyta
A: 

If you are processing feeds you should use Feedzilla

http://railscasts.com/episodes/168-feed-parsing

http://github.com/pauldix/feedzirra

Works like a charm.

Good luck!

Jonathan
A: 

No need for the loop... simply

puts doc.xpath('//link/text()')

will print all link text.

Mark Thomas