views:

79

answers:

4

I am working on a file downloader in Perl (UNIX/Mac OS X on terminal). I am actually searching for libraries that would work with handling checksum verification, progress bars and actually such things that the CPAN library itself uses. Which libraries and places to look can you recommend? Are there maybe some finished things like that around I don't know of?

More detailed:

  1. downloading files with a progress bar
  2. logging of all the action taken
  3. file checksum verification
  4. reading and parsing configuration files (example: YAML format)
  5. sending of the results to a web service
+7  A: 

CPAN is the main place to look for support modules. If you want to do it in Perl, it's quite likely someone's already done it.

For example, for your requirements:

mopoke
Term::ProgressBar is a good pointer, thanks. i am actually searching for more complete packages or libraries, that have this functionality partially already. i am sure that there is something out there...
z3cko
What are you using to download? HTTP? FTP? Something else?Could you use wget or something similar as a pre-built package that includes progress bars, etc.Can you also explain "sending results to a web service"?
mopoke
it should be universal. i want to develop a sort of meta format to automatically fetch source files and compile them. source can be HTTP, FTP HTTPS and so on. i think curl supports all there are good libs to curl. "sending results to a webservice" means posting the compile result, which is rather trivial.
z3cko
Random tip; there's a small outstanding bug in Term::ProgressBar which happens when you try to have it produce output on a different terminal. Bumped into that once, filed a bug, don't think it's been fixed yet.
fennec
+2  A: 

Do not miss LWP. Specifically, LWP::Simple is likely most of what you need to get started. For checksumming, HTTP headers and the like, you probably want the full LWP user agent.

tsee
A: 

Sad to say, you really have to use POE currently.. Specifically you need to use POE::Component::Client::HTTP (for (Keep Alive)[http://search.cpan.org/~gwyn/POE-Component-Server-HTTP-KeepAlive-0.0302/lib/POE/Component/Server/HTTP/KeepAlive.pm] pooling), and probably a few more components. I just did this task I had to download 150k photos daily (in SQL) store by the sha1 of their url, resize them, hash them to their SHA1 of the image and hard link this to the sha1 of the url, and update a database to show the sha1 of the image and date downloaded. I did all of this with POE. And, other than a few hard to debug quirks that I'll probably never fix, and random POE core dumps, it works rather well.

We provide our third party affiliates with a much simpler image downloader that takes an image identified in a CSV through the VIN's row, downloads the image, and renames it to vin hyphen [1..n]. You can find it on github. It uses Paraell::ForkMangaer which is another solution but the nature of using it eliminates KA, and pooling that you can get rather easily with POE.

I'd highly suggest not rolling your own threaded solution, history tells us those are typically the worst.

Evan Carroll
why "sad to say"? is POE evil?
z3cko
A: 

Look at CPAN::Checksums for the stuff that CPAN uses to create the CHECKSUMS file in each author's directory.

brian d foy