views:

68

answers:

3

This is my last question for this I hope. I am using $mech->follow_link to try to download a file. For some reason though the file saved is just the page I first pull up and not the link I want to follow. Is this the correct way I should download the file from the link? I do not want to use wget.

    #!/usr/bin/perl -w
    use strict;
    use LWP;
    use WWW::Mechanize;
    my $now_string = localtime;
    my $mech = WWW::Mechanize->new();
    my $filename = join(' ', split(/\W++/, $now_string, -1));
    $mech->credentials( '***********' , '************'); # if you do need to supply     server and realms use credentials like in [LWP doc][2]
$mech->get('http://datawww2.wxc.com/kml/echo/MESH_Max_180min/') or die "Error: failed to load the web page";
$mech->follow_link( url_regex => qr/MESH/i ) or die "Error: failed to download content";
$mech->save_content("$filename.kmz");
A: 

Change if to unless.

mcandre
still downloads the same page and gives no errors
shinjuo
Change the other _if_ to _unless_.
mcandre
I changed both. I updated the code so you can see it
shinjuo
Try removing *or die*...
mcandre
+1  A: 

Are you sure you want the 3rd link called 'MESH'?

Zaid
no I did not realize until I went back and looked that it was looking for that specific link. It still does not work correctly but its a start. Thanks
shinjuo
+3  A: 

Steps to try

  1. First print the contents from your get, to make sure you're reaching a valid HTML page
  2. Make sure the link you're going to is the third link called "MESH" (case-sensitive?)
  3. Print the contents from your second get
  4. Print the filename to make sure it's wellformed
  5. Check that the file was created successfully

Additional

  • You don't need the unless in either case - it's going to work, or it's going to die

Example

#!/usr/bin/perl -w

use strict;
use WWW::Mechanize;

   sub main{

      my $url    =  qq(http://www.kmzlinks.com);
      my $dest   =  qq($ENV{HOME}/Desktop/destfile.kmz);

      my $mech   =  WWW::Mechanize->new(autocheck => 1);

      # if needed, pass your credentials before this call
      $mech->get($url);
      die "Couldn't fetch page" unless $mech->success;

      # find all the links that have urls to kmz files
      my @links  =  $mech->find_all_links( url_regex => qr/(?:\.|%2E)kmz$/i );

      foreach my $link (@links){               # (loop example)

         # use absolute URL path of the link to download file to destination
         $mech->get($link->url_abs, ':content_file' => $dest);

         last;                                 # only need one (for testing)
      }     
   }

   main();
vol7ron
how do you print the contents? I was trying print $mech->content( format => 'text' ); but it does not seem to work
shinjuo
So I am getting a new file which seems more correct, but the file it is downloading will not open up in google earth like it should. The file is a .kmz file which is what I now have the file downloading as(which I have updated in my code) but when I try to open the file its says it cannot be opened.
shinjuo
you can use `print $mech->response()->content()`, `print $mech->content()`, or even `print %{$mech->get($url)}`. The `format=>'text'` will strip the HTML, which if it's an XML document with just elements and attributes, it might strip everything.
vol7ron
the link it is following though is not a page. just a download link
shinjuo
I think the problem is that it seems like it is trying to save the download page and now the object it is referring to. I could be wrong but the files it is downloading are not working properly
shinjuo
try a manual download. After `$mech->get('http://datawww2.wxc.com/kml/echo/MESH_Max_180min/');`, type `$mech->get($urlOfDynamicLink); print $mech->contents();`
vol7ron
I used this $mech->get('http://datawww2.wxc.com/kml/echo/MESH_Max_180min/MESH_Max_180min_20100707-130536.kmz'); and it still doesn't work still. Is this the way you meant it to be?
shinjuo
Okay it should be print $mech->content(); On that note it is a bunch of jarbled symbols and beeps a bunch than finishes. I figure its jarbled symbols because its a .kmz file which is kind of like an image file
shinjuo
I think its the type of file I am trying to download. Is that anyway just to get the file to download rather than copying and saving its contents?
shinjuo
$mech->get() downloads the file, then you want to save it somewhere. A .kmz file is an encoded KML file. You can use a ZIP 2.0 algorithm to deflate it - ie, open it up with Winzip after you've saved it.
vol7ron
you can also save the file in one step: `$mech->get( 'http://datawww2.wxc.com/kml/echo/MESH_Max_180min/MESH_Max_180min_20100707-130536.kmz', ':content_file' => '20100707_130536.kmz' );`
vol7ron
It wont let me open it in winzip.... not sure what the problem is. I need a site that everyone can see to test it on.
shinjuo
Its like the encoding is different that it should be.
shinjuo
I've modified the answer to give you a working example, that should work.
vol7ron
Wow that works! I changed it around a little bit so that it now only pulls one file. THANKS ALOT! I appreciate all your help
shinjuo
I'm not attesting to the security or how robust it is, it's just an example of how it should work, barebones. Glad you got it working.
vol7ron
Security should not be a problem since I am just running it off of my computer desktop use task scheduler. It is not on the web in any way except the 2 seconds it runs to do its thing. Is there a possible security issue in that?
shinjuo
Whenever a transfer is involved, there is possible security threats. There's an opening between client and host for various methods to push through, but that is a whole other topic. For what you're doing, you should be fine, provided the content is from a trusted source.
vol7ron
Yeah it is. The site is really just a directory of files
shinjuo
I really do appreciate your help. i have learned a lot from just you. I changed the program around to be just for one file and made it so the name changes to the date/time. is it the url_abs that is making it worked compared to what i had?
shinjuo
It could be, I'm not sure what you had before and it's difficult to see what the page is doing when I can't access it. If the URLs had relative paths, then possibly. Also, I think `$mech` would be empty if you chose to store the HTML file with the first `->get()` using `:content_file=>`. There's many reasons why it might not have worked.
vol7ron