I'm looking to parse some old html that has plenty of extraneous tags that could be done with CSS now - <b>
, <font>
, etc. I'm using Hpricot to parse it, but I want to get the innermost "inner_html" - how does one do that with Hpricot? For example lets say I user Hpricot to grab all the <table>
elements which I loop through to get the rows and cells, but I want to get the data inside the cells, but they can have no additional tags or things like <b><font ...>1,000</font></b>
- is there a trick to getting just the "1,000" out?
Thanks,
Ben