tags:

views:

31

answers:

0

Hi,

How can I use nokogiri to split the following HTML into text nodes? I want to somehow split the content by using the <br/> tag as a delimiter or sadly an unclosed <br> which is often the case in the HTML I am scraping.

An example of the html would be:

    <td>
      <font size="2" face="Arial"><b>HALL (J&amp;E) LTD</b><br>
        Head Office<br>
        Questor House<br>
        191 Hawley Road<br>
        Dartford<br>
        Kent <br>
        DA1 1PU<br>
        <br>
        <b>Tel:</b>&nbsp; +44 (0)1322 223456<br>
        <b>Fax:</b> +44 (0)1322 291458<br>
        <br>
        <b>Website:</b>
        <a target="_blank" href="http://www.jehall.co.uk" style="text-decoration: none">
        www.jehall.co.uk</a><br>
        <b>Email:</b>&nbsp;&nbsp;&nbsp;&nbsp;
        <a href="mailto:[email protected]?subject=Enquiry from Defence Suppliers Directory&[email protected]" style="text-decoration: none">
          [email protected]</a>
      </font>
    </td>

I want to scrape the contact details into an array and splitting by <br> would be useful.

Cheers

Paul