views:

176

answers:

3

Could you please give me some suggestions on how to parse HTML in Perl? I plan to parse the keywords(including URL links) and save them to a MySQL database. I am using Windows XP.

Also, do I first need to download some website pages to the local hard drive with some offline Explorer tool? If I do, could you point me to a good download tool?

+1  A: 

You can use one of many HTML parser modules. If you're familiar with jQuery, the pQuery module would be a good choice, as it ports most of the easy-to-use features of jQuery to Perl for HTML parsing and scraping.

MiffTheFox
@MiffTheFox, +1, Thanks for pQuery, I never heard it before, and maybe it's a good start point for me.
Nano HE
+3  A: 

You can use LWP to retrieve the pages you need to parse. There are many ways to go about parsing the HTML. You can use regular expressions to find links and keywords (though it isn't usually a good practice), or modules like HTML::TokeParser or HTML::TreeBuilder.

Narthring
I'll try LWP and the perl HTML modules.
Nano HE
+1  A: 

The HTTrack website copier/downloader has many more features than any available Perl library.

daxim
Thank you for the tool.
Nano HE