views:

31

answers:

1

I'm looking to parse some old html that has plenty of extraneous tags that could be done with CSS now - <b>, <font>, etc. I'm using Hpricot to parse it, but I want to get the innermost "inner_html" - how does one do that with Hpricot? For example lets say I user Hpricot to grab all the <table> elements which I loop through to get the rows and cells, but I want to get the data inside the cells, but they can have no additional tags or things like <b><font ...>1,000</font></b> - is there a trick to getting just the "1,000" out?

Thanks,
Ben

+1  A: 

I'm not sure if this is completely what you want, but you might want to look at the inner_text method. It will return the same thing as inner_html, except all of the HTML elements will be removed.

AboutRuby