Parsing HTML with Hpricot & Ruby - getting the innermost html? | ansaurus

tags:

views:

31

answers:

1

Q:

Parsing HTML with Hpricot & Ruby - getting the innermost html?

I'm looking to parse some old html that has plenty of extraneous tags that could be done with CSS now - <b>, <font>, etc. I'm using Hpricot to parse it, but I want to get the innermost "inner_html" - how does one do that with Hpricot? For example lets say I user Hpricot to grab all the <table> elements which I loop through to get the rows and cells, but I want to get the data inside the cells, but they can have no additional tags or things like <b><font ...>1,000</font></b> - is there a trick to getting just the "1,000" out?

Thanks,
Ben

+1 A:

I'm not sure if this is completely what you want, but you might want to look at the inner_text method. It will return the same thing as inner_html, except all of the HTML elements will be removed.

AboutRuby 2010-10-09 03:13:30

related questions

Autosizing Textarea

Regular expression for parsing links from a webpage?

What are good tools for creating compiled HTML help files (.chm)?

Looking for WYSIWYG HTML editor

Any reason not to start using the HTML 5 doctype?

HTML comments break down

HTML Comments Markup

Setting a div's height in HTML with CSS

Wrapping lists into columns

Is a "Confirm Email" input good practice when user changes email address?

<XMP> Tag

HTML version choice

Options for HTML scraping?

How do you disable browser Autocomplete on web form field / input tag?

How do I make a checkbox toggle from clicking on the text label as well?

Html CSS Editor

Wordpress theme development offline tools

How do I give my web sites an icon for iPhone?

In HTML, how to word-break on a dash?

Detecting font in JavaScript

How do you test layout design across multiple browsers/OSs?

How do I print an HTML document from a web service?

Multiple submit buttons on a HTML form

How can I determine a web user's time zone?

Why doesn't the percentage width child in absolutely positioned parent work in IE7?