hpricot

XML => HTML with Hpricot and Rails

I've never worked with web services and rails, and obviously this is something I need to learn. I've chosen to use hpricot because it looks great. Anyway, _why's been nice enough to provide the following example on the hpricot website: #!ruby require 'hpricot' require 'open-uri' # load the RedHanded home page doc = Hpricot(open("ht...

How do I get Hpricot 0.6 Gem Built on FreeBSD?

When I run rake gems:build with hpricot 0.6.164 on my FreeBSD server I get: Error: Failed to build gem native extension. /user/localbin/ruby18 extconf.rb gems:build RB_USER_INSTALL checking for main() in -lc... yes creating Makefile make make install /usr/bin/install -c -o root -g wheel -m 0755 hpricot_scan.so /u...

Where is _why?

I have been trying to reach the sources and references for Hpricot at code.whytheluckystiff.net For a long time now. Do you know if the site, the repository or the author moved to a different location or if it is only a transient situation? Best regards and happy coding ...

Nokogiri (RubyGem): Find and replace HTML tags

I have the following HTML: <html> <body> <h1>Foo</h1> <p>The quick brown fox.</p> <h1>Bar</h1> <p>Jumps over the lazy dog.</p> </body> </html> ...and by using the RubyGem Nokogiri (a hpricot replacement), I'd like to change it into the following HTML: <html> <body> <p class="title">Foo</p> <p>The quick brown fox.</p> <p class="title"...

Parsing an HTML table using Hpricot (Ruby)

I am trying to parse an HTML table using Hpricot but am stuck, not able to select a table element from the page which has a specified id. Here is my ruby code:- require 'rubygems' require 'mechanize' require 'hpricot' agent = WWW::Mechanize.new page = agent.get('http://www.indiapost.gov.in/pin/pinsearch.aspx') form = page.forms.find...

Looking for a recommendation of a good tutorial on best practices for a web scraping project?

I need to do a fairly extensive project involving web scraping and am considering using Hpricot or Beautiful Soup (i.e. Ruby or Python). Has anyone come across a tutorial that they thought was particularly good on this subject that would help me start the project off on the right foot? ...

Installing hpricot for JRuby

I'm trying to look at cucumber for Jruby on Rails. One of the pre-requesites is webrat which has as pre-requisite hpricot. I've installed the gem with hpricot using: gem install hpricot --source http://code.whytheluckystiff.net --version 0.6.1 --platform java This installs the java version of hpricot. I add the hpricot_scan.jar to the...

hpricot with firebug's XPath

I'm trying to extract some info from a table based website with hpricot. I get the XPath with FireBug. /html/body/div/table/tbody/tr/td/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr[3]/td/table[3]/tbody/tr This doesn't work... Apparently, the FireBug's XPath, is the path of the rendered HTML, and no the actual HTML from the...

Segmentation fault in hpricot

I'm using hpricot to read HTML. I got a segmentation fault error, I googled and some say upgrade to latest version of Ruby. I am using rails 2.3.2 and ruby 1.8.7. How to resolve this error? ...

Is there anything like hpricot or beautiful soup for php?

Possible Duplicate: Robust, Mature HTML Parser for PHP I am looking for a good way to parse and modify html documents server side in php. Beautiful soup and hpricot look like very good tools but they are not available for php. Are there any good libraries that can do this in php? Tidy appears to be partially what I am looking fo...

Action caching not working

I'm fetching and manipulating XML from twitter and flickr in my rails app. The results appear on every page and the parsing is handled in the Application Controller with Hpricot and open-uri. This is my first experiment with action caching and it doesn't seem to be working. I'm in dev mode using WEBRick. Everything appropriate is set to...

Hpricot Element intersection

Hi, I want to remove all images from a HTML page (actually tinymce user input) which do not meet certain criteria (class = "int" or class = "ext") and I'm struggeling with the correct approach. That's what I'm doing so far: hbody = Hpricot(input) @internal_images = hbody.search("//img[@class='int']") @external_images = hbody.search("//...

Non greedy searches with Hpricot?

I'm using Hpricot for traversing an XML packet. For each node I'm on, I want to get a list of the immediate children . However when using (current_node/:section) I'm getting ALL descendant sections, not just the immediate children. How can I get around this? ...

Getting Rails to play with Hpricot

I'm trying to get Hpricot working with Rails on my dev machine. I've installed Hpricot [0.8.1] using the standard 'gem install hpricot' and confirmed it works fine with my standard Ruby installation [1.8.7]; however when I try the same with my Rails [2.1.0] installation, I get an error - TypeError: superclass mismatch for class BogusET...

Hpricot CSS Class search

Hey guys. I am working on some code that scrapes a page for two css classes on a page. I am simply using the Hpricot search method for this as so: webpage.search("body").search("div.first_class | div.second_class") ...for each item found i create an object and put it into an array, this works great except for one thing. The search...

Removing anything between XML tags and their content

I would need to remove anything between XML tags, especially whitespace and newlines. For example removing whitespace and newslines from: </node> \n<node id="whatever"> to get: </node><node id="whatever"> This is not meant for parsing XML by hand, but rather to prepare XML data before it's getting parsed by a tool. To be more s...

Hpricot, Get all text from document

Hi Guys, I have just started learning Ruby. Very cool language, liking it a lot. I am using the very handy Hpricot HTML parser. What I am looking to do is grab all the text from the page, excluding the HTML tags. Example: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Data Protectio...

How do you know when to use an XML parser and when to use ActiveResource?

I tried using ActiveResource to parse a web service that was more like a HTML document and I kept getting a 404 error. Do I need to use an XML parser for this task instead of ActiveResource? My guess is that ActiveResource is only useful if you are consuming data from another Rails app and the XML data is easily translatable to a Rail...

libxml-ruby parsing HELP...

Alright, switching from working Hpricot to Libxml-ruby due to speed and well the disappearance of _why, looked at Nokogiri for a second but decided to look at Libxml-ruby for speed and longevity. I must be missing something basic but what im trying to do isn't working, here's my XML string file =<<XML <?xml version="1.0" encoding="utf-8...

ruby noob: /usr/lib/ruby/1.8/rss/rss.rb:922:in `have_required_elements?': undefined method

Sorry, this might be a basic/stupid/noob question - I am just trying to tweak an existing Ruby script - it runs on my Mac ok, but failing to run on Ubuntu 9.04. The error is this: /usr/lib/ruby/1.8/rss/rss.rb:922:in `have_required_elements?': undefined method `have_required_elements?' for "App Store Reviews for ":String (NoMethodError)...