hpricot

Parsing HTML with Hpricot & Ruby - getting the innermost html?

I'm looking to parse some old html that has plenty of extraneous tags that could be done with CSS now - <b>, <font>, etc. I'm using Hpricot to parse it, but I want to get the innermost "inner_html" - how does one do that with Hpricot? For example lets say I user Hpricot to grab all the <table> elements which I loop through to get the r...

RoR: Storing HTML in a File for Later Use

Hi All, In our application we have different themes and each theme has its own default content in the following structure: ROWS COLUMNS CONTENT HTML DATA 1 CONTENT HTML DATA 2 There could be multiple rows, column and content elements. We need to store this data in a file (manually) and then read & dump it ...

HPricot css search: How do I select the parent/ancestor of a particular element using a string selector?

I'm using HPricot's css search to identify a table within a web page. Here's a sample html snippet I'm parsing: <table height=61 width=700> <tbody> <tr> <td><font size=3pt color = 'Blue'><b><A NAME=a1>Some header text</A></b></font></td></tr> ... </tbody></table> There are lots of tables in the page. I want to find the table which co...

difficulty installing hpricot in Ruby 1.9.2

I am using Ruby 1.9.2 and i have an assignment to use hpricot. I have tried to install hpricot gem and I get error messages. extconf.rb failed. Could not create makefile. Check the mkmf.log. But I cannot find mkmf.log. I checked the other answers and tried to install using gem install hpricot-platform=mswin32. that didnt work either. ...

Getting portion of href attribute using hpricot

I think I need a combo of hpricot and regex here. I need to search for 'a' tags with an 'href' attribute that starts with 'abc/', and returns the text following that until the next forward slash '/'. So, given: <a href="/abc/12345/xyz123/">One</a> <a href="/abc/67890/xyzabc/">Two</a> I need to get back: '12345' and '67890' Can anyo...