tags:

views:

219

answers:

2

I've been playing around with HPricot, but after a fair amount of searching, I've not been able to work this out.

I'm trying to parse a HTML page and find all tags with a href to an mp3 file. So far I've got

<ul>
    <% @page.search('//a[@href*=mp3]').each do |link| %>    
        <li>
            <%= link.inner_text %>
        </li>
    <% end %>
</ul>

which is working fine, and a regex, /href\s*=\s*\"([^\"]+)(.mp3)/ which also works. I'm just not sure how to combine the two.

Is there a good example, or documentation that someone could point me to in order to work out what I can do with the .search function.

Thanks

A: 

found the answer. the method is attributes, (not attr) and also, the brackets need to be square. link.attributes['href']

Aaron Moodie
+1  A: 

You can access the attribute href with

link.attr('href')

As CSS3 selector you might want to consider @href$=.mp3 (instead of *=) as it matches only attributes which ends in .mp3.

Edit: You're right, sorry. I found out, that attr is only an alias for set for Hpricot::Elements. The right way is indeed:

link.attributes['href']

Nevertheless I would like to recommend Nokogiri as a faster substitute for Hpricot.

andre-r
thanks andre-r, I'm getting the error undefined method `attr' when I use that method. I've included both the HPricot and open-uri gems. Is there something I'm missing?
Aaron Moodie