ansaurus

Question

How can I make an HTTP GET request from Perl?

Answer 1

+9 A:

You have many questions in one. The answer to the question in the title of your post is to use LWP::Simple.

Most of your other questions are answered in perlfaq9 with appropriate pointers to further information.

Sinan Ünür 2009-10-12 20:33:52

Answer 2

+3 A:

As more general answer, Perl is a perfectly fine language for doing HTTP requests, as are a host of other languages. If you're familiar with Perl, don't even hesitate; there are many excellent libraries available to do what you need.

Robert P 2009-10-12 20:36:10

Answer 3

+4 A:

As for the parsing markup with regular expressions part of your question, DON'T!

http://htmlparsing.icenine.ca explains some of the reasons why you shouldn't do this. Although what you're seemingly attempting to parse seems simple, use a proper parser.

genio 2009-10-12 20:36:41

Agree with the *DON'T* part. Don't agree with linking to a page that really doesn't explain anything.

Sinan Ünür 2009-10-12 20:41:24

Thirded. There are many libraries out there to help you with this, often included in standard Perl distributions. Don't reinvent the wheel, especially when it's a tricky multi-part wheel with six dozen caveats about wheel length, circumference, axle size, and rotating speed maximums!

Robert P 2009-10-12 20:49:50

The problem as I see it now meets all preconditions for using regexp listed at page you linked. I need to parse directory listing of known Apache version configured in known way,also I may need functionality to only download files with certain extension and leave other files unchecked which is easy with regex. I already feel bad about using worst coding practices though.

Muxecoid 2009-10-12 23:54:28

Although the general case calls for an HTML parser, in this case, the directory listing from Apache, a regex is not so bad. Just use a little perspective. However, my HTML::SimpleLinkExtor module is even easier than a regex. :)

brian d foy 2009-10-13 07:40:30

@Muxecoid If those are your requirements, maybe you should use wget ( http://www.gnu.org/software/wget/ ) or cURL ( http://curl.haxx.se/ ).

Sinan Ünür 2009-10-13 10:39:49

Answer 4

+9 A:

As far as the whole problem description goes, I would use WWW::Mechanize. Mechanize is a subclass of LWP::UserAgent that adds stateful behavior and HTML parsing. With mech, you can just do $mech->get($url_of_index_page), and then use $mech->find_all_links(criteria) to select the links to follow.

hobbs 2009-10-12 21:20:26

+1 for Mech. It's kewl.

friedo 2009-10-12 23:51:21

This should solve the problem, thanks.

Muxecoid 2009-10-13 00:08:26

Oops. It doesn't solve the problem. I want to minimize the usage of functions not in the standard library. Is there any baseline function to do so? (Perl 5 build 5.002)

Muxecoid 2009-10-13 11:25:40

Perl 5.002 is ancient (almost 15 years old at this point!). You're not going to find standard HTTP modules in its library, and I strongly suggest trying to update to a newer version of perl.

Oesor 2009-10-13 15:54:55

ansaurus

tags:

views:

answers:

How can I make an HTTP GET request from Perl?

related questions