ansaurus

Question

extract all links from a HTML page, exclude links from a specific table

Answer 1

+2 A:

Use WWW::Mechanize's update_html method to remove that table before finding the links. This method allows you to do whatever you want to the source code in $mech->content.

AmbroseChapel 2010-09-13 21:06:41

Thanks! But it turns out that deleting tables on wiki pages is not a very accurate, not to mention efficient, way to achieve what I intended to do, since tables on each chemical elements' wiki pages have different things in their tags. So it's hard to generalize a table-delete function for all pages. I actually ended up using HTML::TreeBuilder to look for links within <p></p> tags (since the kind of links I'm looking for are very likely appear in paragraphs). It yielded much more accurate results and ran pretty fast.

Z.Zen 2010-09-15 02:46:39

ansaurus

tags:

views:

answers:

extract all links from a HTML page, exclude links from a specific table

related questions