questions about hpricot | ansaurus

hpricot

Installing Hpricot on Ruby 1.9.1 on Windows

I am trying to install hpricot using the command: >gem install hpricot -v 0.8.2 Building native extensions. This could take a while... ERROR: Error installing hpricot: ERROR: Failed to build gem native extension. C:/Ruby19/bin/ruby.exe extconf.rb checking for stdio.h... * extconf.rb failed * Could not create Makefile due to some ...

Scraping hidden HTML (when visible = false) using Hpricot (Ruby on Rails)

Hi, I've come across an issue which unfortunately I can't seem to surpass, I'm also just a newborn to Ruby on rails unfortunately hence the number of questions I am attempting to scrape a webpage such as the following: http://www.yellowpages.com.mt/Malta/Grocers-Mini-Markets-Retail-In-Malta-Gozo.aspx I would like to scrape The Addres...

screen-scraping

hpricot in netbeans

hi I am trying to use hpricot in JRuby. My problem is the following. If I have this code: #!ruby require 'hpricot' require 'open-uri' # load the RedHanded home page doc = Hpricot(open("http://redhanded.hobix.com/index.html")) where do I put it? Into my controller? Because its not accepting it there. And if I'm supposed to put it...

hpricot problem

Hi I am trying to use hpricot in a controller. I would like to pass this value to a html.erb page so I can display it on the screen So I wrote this: session[:allcars] = (doc/"td.car_title/text()") but this gives an error when I tried this: puts (doc/"td.car_title/text()") this printed the cars into the console. So I can't under...

What is the best way to match id's against a regular expression in Hpricot?

Using apricot, it is pretty easy to see how I can extract all elements with a given id or class using a CSS Selector. Is it possible to extract elements from a document based on whether some attribute of those elements matches against some regular expression? ...

Ruby for romance? How to update a script from itself.

My wife enjoys it when I use my geek abilities to be "romantic" so I had an idea for a ruby script to install on her Mac that would send her quotes and little notes from me throughout the day. I already figured out that I'll be using GeekTool to run a script in the background and I'll use growlnotify to display the messages. Now what I n...

Update a single XML entity using Hpricot in Ruby?

I am going to be using Hpricot to process an XML file. I want to randomly display some quotes from the file, and then I want to keep track of how often each quote has been displayed. Is it possible for me to update a single item within the XML file using Hpricot (or is there some other solution that can do this for me?) or should I just...

Loading an hpricot element with a chunk of html

is there a way to load a chunk of html into an Hpricot::Doc object? I am trying to parse various chunks of html within custom tags from a page. so if I have: <foo> <b>here is some stuff</b> <table> <tr> <td>one</td> <td>two</td> </tr> <tr> <td>three</td> <td><four</td> </tr> </table> </foo...

XPath and Hpricot -- works on some machines, not others?

The following hpricot code successfully extracts the STPeriods in the XML on two of my machines (Vista and an Ubuntu server) but fails on another Ubuntu laptop. All machines have Hpricot v0.82 Any ideas? Totally stumped. Hpricot code: (doc/"WeatherFeed/Location/WxShortTerm/STPeriod").each do |ham_forecast| XML file <?xml version=...

Searching Hpricot with Regex

I'm trying to use Hpricot to get the value within a span with a class name I don't know. I know that it follows the pattern "foo_[several digits]_bar". Right now, I'm getting the entire containing element as a string and using a regex to parse the string for the tag. That solution works, but it seems really ugly. doc = Hpricot(open("ht...

how to remove event attribute from html using Hpricot ?

I want to remove a list of dom events attribute from html? how to do this? like: before = "<div onclick="abc" >abc</div>" after = clean_it(before) // after => "<div>abc</div>" DOM_EVENT_TO_BE_REMOVE = "onclick|ondblclick|onerror|onfocus|onkeydown" // i want to remove these events // i want to do it like this def clean_it(html...

Hpricot error parsing special characters in URI

I'm working on a ruby script to grab historical stock prices from Yahoo, using Hpricot to parse the pages. This is mostly straighforward: the url is "http://finance.yahoo.com/q/hp?s=TickerSymbol" For example, to look up Google, I would use "http://finance.yahoo.com/q/hp?s=GOOG" Unfortunately, it breaks down when I'm looking up the price...

How to get an element using inner text (Watir, Nokogir, Hpricot)

I have been expeirmenting with Watir, Nokogir and Hpricot. All of these use top->down approach which is my problem. i.e. they use element type to search element. I want to find out the element using the text without knowing element type. e.g. <element1> <element2> Text2 </element2> <element3> Text3 </element3> text4 </elem...

How do I add text to an empty element in Hpricot?

If I have an empty tag: <tag/> How can I add text so that I end up with: <tag>Hello World!</tag> I can only seem to swap the whole tag with different content or add content before/after it. ...

hpricot throws exception when trying to parse url which has noscript tag

I use hpricot gem in ruby on rails to parse a webpage and extract the meta-tag contents. But if the website has a <noscrpit> tag just after the <head> tag it throws an exception Exception: undefined method `[]' for nil:NilClass I even tried to update the gem to the latest version. but still the same. this is the sample code i use. r...

how to translate this hpricot code to nokogiri ?

Hpricot(html).inner_text.gsub("\r"," ").gsub("\n"," ").split(" ").join(" ") hpricot = Hpricot(html) hpricot.search("script").remove hpricot.search("link").remove hpricot.search("meta").remove hpricot.search("style").remove found it on http://www.savedmyday.com/2008/04/25/how-to-extract-text-from-html-using-rubyhpricot/ ...

CSS selector exclude elements, hpricot

Hey, I am trying to write a CSS selector that select everything except the script elements with hpricot, I can easily select the all the contents of the select-me div and then remove the script elements but I was wondering if its possible to use a selector which will exclude the script elements: <div class='select-me'> <p>This is some ...

hpricot using java?

I've just noticed that a lot of hpricot code is written in java... I heard that JRuby performed a lot better than native ruby when processing regular expression. Is maybe the java classes just activated if JRuby or Java is installed and the ruby used if these are not found? It's something puzzling indeed. Thanks ...

How can I get Hpricot to play nice with HTML5?

I am using Hpricot to parse a theme file. I have noticed, however, that if I feed a valid HTML5 document into Hpricot(), it auto-closes HTML5 tags (like <section>), and messes with the DOCTYPE. Are there any extensions to Hpricot, or perhaps a flag I need to set, that will allow HTML5 documents to be parsed correctly? ...

nokogiri vs hpricot?

Which one would you choose? My important attributes are (not in order) Support & Future enhancements Community & general knowledge base (on the Internet) Comprehensive (i.e proven to parse a wide range of *.*ml pages) Performance Memory Footprint (runtime, not the code-base) ...

1
2
3
4
5