ansaurus

Question

How do I extract an HTML title with Perl?

Answer 1

+6 A:

I would use pQuery. It works just like jQuery.

You can say:

use pQuery;
my $page = pQuery("http://google.com/");
my $title = $page->find('title');
say "The title is: ", $title->html;

Replacing stuff is similar:

$title->html('New Title');
say "The entirety of google.com with my new title is: ", $page->html;

You can pass an HTML string to the pQuery constructor, which it sounds like you want to do.

Finally, if you want to use arbitrary HTML as a "template", and then "refine" that with Perl commands, you want to use Template::Refine.

jrockway 2009-02-22 02:46:36

Answer 2

A:

If you just want to extract the page title you can use a regular expression. I believe that would be something like:

my ($title) = $html =~ m/<title>(.+)<\/title>/si;

where your HTML page is stored in the string $html. In si, the s stands for for single line mode (i.e., the dot also matches a newline) and i for ignore case.

2009-02-22 03:10:55

Results will be not what what you want if there's another </title> on the page after the end of the actual title.In general, regular expressions for HTML parsing is a limited proposition.

Andy Lester 2009-02-22 03:27:50

Answer 3

+1 A:

It's not clear to me what you are asking. You seem to be talking about something that could run in the user's browser, or at least something that already has an html page loaded.

If that's not the case, the answer is URI::Title.

ysth 2009-02-22 03:33:29

Answer 4

+2 A:

HTML::HeadParser does this for you.

brian d foy 2009-02-23 18:41:21

The link you posted returns search results that does not return HTML::HeadParser. I had to look around for it: http://search.cpan.org/~gaas/HTML-Parser/

gpojd 2009-02-23 18:45:03

Ah, yes, the HTML::HeadParser module comes with HTML::Parser.

brian d foy 2009-02-23 19:03:24

Answer 5

A:

use strict; use LWP::Simple;

my $url = 'http://www.google.com'|| die "Specify URL on the cmd line"; my $html = get ($url); $html =~ m{(.*?)}gism;

print "$1\n";

Manish Shukla 2010-09-29 11:15:31

ansaurus

tags:

views:

answers:

How do I extract an HTML title with Perl?

related questions