tags:

views:

85

answers:

3

Do I gain something when I transform my $url like this: $url = URI->new( $url )?

#!/usr/bin/env perl
use warnings; use strict;
use 5.012;
use URI;
use XML::LibXML;

my $url = 'http://stackoverflow.com/';
$url = URI->new( $url );

my $doc = XML::LibXML->load_html( location => $url, recover => 2 );
my @nodes = $doc->getElementsByTagName( 'a' );
say scalar @nodes;
+4  A: 

The URI module constructor would clean up the URI for you - for example correctly escape the characters invalid for URI construction (see URI::Escape).

DVK
+3  A: 

The URI module as several benefits:

  • It normalizes the URL for you
  • It can resolve relative URLs
  • It can detect invalid URLs (although you need to turn off the schemeless bits)
  • You can easily filter the URLs that you want to process.

The benefit that you get with the little bit of code that you show is minimal, but as you continue to work on the problem, perhaps spidering the site, URI becomes more handy as you select what to do next.

brian d foy
+1  A: 

I'm surprised nobody has mentioned it yet, but$url = URI->new( $url ); doesn't clean up your $url and hand it back to you, it creates a new object of class URI (or, rather, of one if its subclasses) which can then be passed to other code which requires a URI object. That's not particularly important in this case, since XML::LibXML appears to be happy to accept locations as either strings or objects, but some other modules require you to give them a URI object and will reject URLs presented as plain strings.

Dave Sherohman
Well, I didn't mention it because it's implied that a constructor is giving you back an object. This object, however, has stringification overloaded so you can also just treat it like a string.
brian d foy