I'm looking to parse some old html that has plenty of extraneous tags that could be done with CSS now - <b>, <font>, etc. I'm using Hpricot to parse it, but I want to get the innermost "inner_html" - how does one do that with Hpricot? For example lets say I user Hpricot to grab all the <table> elements which I loop through to get the r...
Hi All,
In our application we have different themes and each theme has its own default content in the following structure:
ROWS
COLUMNS
CONTENT
HTML DATA 1
CONTENT
HTML DATA 2
There could be multiple rows, column and content elements. We need to store this data in a file (manually) and then read & dump it ...
I'm using HPricot's css search to identify a table within a web page. Here's a sample html snippet I'm parsing:
<table height=61 width=700>
<tbody>
<tr>
<td><font size=3pt color = 'Blue'><b><A NAME=a1>Some header text</A></b></font></td></tr>
...
</tbody></table>
There are lots of tables in the page. I want to find the table which co...
I am using Ruby 1.9.2 and i have an assignment to use hpricot. I have tried to install hpricot gem and I get error messages. extconf.rb failed. Could not create makefile. Check the mkmf.log. But I cannot find mkmf.log.
I checked the other answers and tried to install using gem install hpricot-platform=mswin32. that didnt work either.
...
I think I need a combo of hpricot and regex here. I need to search for 'a' tags with an 'href' attribute that starts with 'abc/', and returns the text following that until the next forward slash '/'.
So, given:
<a href="/abc/12345/xyz123/">One</a>
<a href="/abc/67890/xyzabc/">Two</a>
I need to get back:
'12345'
and
'67890'
Can anyo...