ansaurus

Question

How do I extract links in JavaScript that point to HTML pages in Perl?

Answer 1

+2 A:

Yes, HTML::LinkExtor does not understand javascript. In fact, it's pretty unlikely that you'll get anything that recognizes URLs embedded in javascript, simply because that would require typically running actual code.

Randal Schwartz 2009-11-25 20:50:22

Answer 2

+1 A:

Perl is going to have a lot of ways to do this through brute force. You could use the Push/Pull Parser to jump between tags. You might be able to just slurp the entire page and regexp through it for links, or for links within JavaScript.

Have you looked at WWW::Mechanize::Plugin::JavaScript? The WWW::Mechanize module is a web botting best friend (not that you are trying to bot). I've used this module before and can say its one of the best Perl module on CPAN.

Here is an example from CPAN: Sets the named variable to the value given

$m->plugin('JavaScript')->set(
      'document', 'location', 'href' => 'http://www.perl.org/');

JulianK 2009-11-25 21:22:10

It is a great module, and it's FAQ is very funny, particularly because so many people ask for javascript support...http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/FAQ.pod

AmbroseChapel 2009-11-25 23:34:07

Answer 3

A:

I'd use WWW::Mechanize for most link gathering. Other than that I'd do my own matching:

my @links = $content =~ m`javascript:openpopup\('([^\']+)'`g;

chris d 2009-11-25 21:24:34

ansaurus

tags:

views:

answers:

How do I extract links in JavaScript that point to HTML pages in Perl?

related questions