views:

59

answers:

2

I'm using Perl.

I have the tag, for example: "XYZ_PKM_HTML" I would like to be able to provide a base url, for example: www.example.com and the to get the HTML page (not necessarily the main page, thats easy) where this tag appears. is it possible? any idea? (or already made modules, looked on cpan, there were some interesting stuff, but not installable)

Thanks,

+1  A: 

You seem to want to implement a web site crawler and a searcher. You usually do the former with WWW::Mechanize and the latter with HTML::Twig

DVK
First of all, thanks for the reply. Secondly, I already familiar with Mechanize, but since I never really implemented a crawler, I'm wondering how to tackle it, how will I make it go through all the site links, and not other non related to the site links (ads and such). also, its possible that it will run for quite some time if the site has lots of pages, any recommendations??
soulSurfer2010
+4  A: 

MJD has an extended example on writing a web spider in Higher-Order Perl. It is section 4.7. See page 187 in Chapter 4.

Of course, you can also try the WWW::SimpleRobot module he mentions.

Sinan Ünür