ansaurus

Question

text parsing in ruby

Answer 1

+1 A:

foo = "@BreakingNews: Typhoon Morakot hits Taiwan, China evacuates thousands http://news.bnonews.com/u4z3"
r = foo.gsub(/http:\/\/[\w\.:\/]+/, '')
puts r
# @BreakingNews: Typhoon Morakot hits Taiwan, China evacuates thousands

hobodave 2009-08-07 06:06:42

Answer 2

A:

It can be done in quick and dirty way or in a sophisticated way. I am showing the sophisticated way:

require 'rubygems'
require 'hpricot' # you may need to install this gem
require 'open-uri'

## first getting the embeded/framed html file's url
start_url = 'http://news.bnonews.com/u4z3'
doc = Hpricot(open(start_url))
news_html_url = doc.at('//link[@href]').to_s.match(/(http[^"]+)/) 

## now getting the news text, its in the 3rd <p> tag of the framed html file
doc2 = Hpricot(open(news_html_url.to_s))
news_text = doc2.at('//p[3]').to_plain_text
puts news_text

Try to understand what the code is doing in each step. And apply the knowledge in your future projects. Take help from these pages:

http://wiki.github.com/why/hpricot/an-hpricot-showcase

http://code.whytheluckystiff.net/doc/hpricot/

vulcan_hacker 2009-08-07 10:43:35

It doesn't appear you read the question at all.

hobodave 2009-08-07 15:46:51

@hobodave:I tried again and this time it appears I did misunderstand the question last time. I assumed there was bad English involved and he wants to get the text from that link. I am sorry for that. Pretty simple problem then.

vulcan_hacker 2009-08-10 07:03:27

ansaurus

tags:

views:

answers:

text parsing in ruby

related questions