hpricot

Ruby - Writing Hpricot data to a file

Hey everyone, I am currently doing some XML parsing and I've chosen to use Hpricot because of it's ease of use and syntax, however I am running into some problems. I need to write a piece of XML data that I have found out to another file. However, when I do this the format is not preserved. For example, if the content should look like t...

remove <font> tag using hpricot

the html like this: "[font color="#FF0000"]test [font color="#FF0000"]Hello world[/font][/font]" I want to replace the font tag the result like this: test Hello world thanks you ...

hpricot segfault?

Any idea why hpricot might segfault on this page? trial_url = 'http://www.controlled-trials.com/ISRCTN56071145/' doc = Hpricot(open(trial_url)) produces: /Users/ap257/.gem/ruby/1.8/gems/hpricot-0.8.2/lib/hpricot/parse.rb:33: [BUG] Segmentation fault ruby 1.8.7 (2009-06-08 patchlevel 173) [universal-darwin10.0] Abort trap Please cou...

how to remove html element's style attribute using Hpricot?

like this: <p style="font-size: 12pt;"> Hello world <span style="font-weight: bold;">just do it</span> </p> I want to remove every element's "style" attribute. I want the result like this: <p>Hello world <span>just do it</span></p> how to do this using hpricot? thanks. ok I have solved this like below: doc = Hpricot("<p st...

Ruby: Computed Style for a webpage

I'm using Hpricot to parse an html page, but need to get the computed styles for each element. For example, if I have an h1 Hpricot element and the external CSS for the page has a background-image defined for h1's, how can I find out what the background-image is? ...

Scraping pages with asynchronous responses with Hpricot

Hi there, I'm trying to scrape a page but the initial response has nothing in the body as the content is pumped in asynchronously, e.g. the results from a search on the apple website: http://www.apple.com/uk/search/?q=searching+for+something&amp;sec=global Any ideas on how I can successfully grab the results from the search with hprico...

escape colon in Xpath search

Hi, I'm using Hpricot with selenium I have this html input element: <input id="foo:bar"/> And I'm trying to get this value with this Xpath expression: source = Hpricot(@selenium.get_html_source) source.search("//input[@id='foo:bar']") but it is not finding anything because of the colon. I have seen that the Xpath expression cannot...

Encoding problems with hpricot

I am getting the following encoding error when trying to scrap web pages with hpricot in ruby 1.9: Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8 I can reproduce the error by doing the following: ska:~ sam$ rvm 1.9.2@hpricot ska:~ sam$ ruby -v ruby 1.9.2dev (2010-05-31 revision 28117) [x86_64-dar...

How to replace a node by a ruby string ?

Hi evryone, I'm trying to replace all my tags in an HTML file by <%= image_tag() %> rails tag. What i want to do is something like : doc = open("myfile.html") { |f| Hpricot(f) } imgs = doc.search("//img") # here i got all Hpricot::Elements imgs.each { |i| # fake function name ! i.replace_by_HTML('<%= image_tag("/images/blab...

Remove an element's class attribute with Hpricot

How do I do it? E.g., <span class="selected" id="hi">HELLO</span> should become <span id="hi">HELLO</span> ...

Parsing problem with hpricot

I have an XML that looks like this: <data> <image src="http://www.someweb.com/something.png"/&gt; </data> What is the correct way to use hpricot to extact just 'http://www.someweb.com/something.png'? The closest I can get is this... >>(doc/"image").first => {emptyelem <image src="http://www.someweb.com/something.png"&gt;} I've re...

Can I use Hpricot to find the main article text of any/most websites?

I need a way of extracting the main text from any webpage that displays an article. Similar to the way that Readability can find the main text on any website that it's run on. I'm using Ruby on Rails, so I think Hpricot is my best bet. Is what I'm looking for possible in Hpricot? Is there an example somewhere? Thanks for reading. ...

Getting image attributes via Hpricot

Hi all: I'm trying to get the largest image off a page I parse with Hpricot and am not having any luck. How do I access the width and height attributes of an img tag with it? Thanks...Chris ...

Ruby Hpricot RegEx replace <BR>'s with <P>'s

Can someone please tell me how to convert this line of Javascript to Ruby using Hpricot & RegEx? // Replace all doubled-up <BR> tags with <P> tags, and remove fonts. var pattern = new RegExp ("<br/?>[ \r\n\s]*<br/?>", "g"); document.body.innerHTML = document.body.innerHTML.replace(pattern, "</p><p>").replace(/<\/?font[^>]*>/g, ...

hpricot add attribute to a HTML tag?

Can someone please explain how to add a custom attribute to an HTML tag using Ruby with Hpricot gem? I have a tag that looks like this: <div class="test" id="tag1" style=""> and I want to add a custom integer attribute called 'Readable=0' so it looks like this: <div class="test" id="tag1" style="" readable=0> Is this possible? ...

how does one remove <![CDATA[ ]]> tags from around text in XML using Hpricot?

i just want the text out of there with out those tags. Does Hrpicot.XML have any methods for this? ...

Hpricot and Rails

I am completely new to Ruby and Rails... in fact I created my first application in Rails today that makes an HTTP request to pull back an XML document and then outputs it to the screen.. something simple to get started.. Well I am needing to now parse the XML string but am lost on how to do that exactly with Hpricot. Here is my code so...

Search XML Nodes for element in Ruby and Hpricot

I am trying to write a Rails app that takes an XML object and then iterates over the object showing the user the information contained inside the different nodes. I am completely new to Rails, coming from a PHP background and am having some trouble with a particular function. I need to basically say, if this node exists die, if it does...

Can any of Ruby's HTML Parsers do Javascript to see the resulting DOM?

When trying Hpricot and Nokogiri, the HTML can be fetched and parsed, but can they also execute the Javascript as well so that the content shows on the page? (shows up in the the DOM). That's because some page won't show the info unless the Javascript interpreter has run. ...

XML to hash table in Ruby: Parsing list of historical inventions.

I'd like to slurp the following data about historical inventions into a convenient Ruby data structure: http://yootles.com/outbox/inventions.xml Note that all the data is in the XML attributes. It seems like there should be a quick solution with a couple lines of code. With Rails there'd be Hash.from_xml though I'm not sure that would...