views:

84

answers:

5

Ok so i have 6.5 Million images in a folder and I need to get them moved asap. I will be moving them into their own folder structure but first I must get them moved off this server.

I tried rsync and cp and all sorts of other tools but they always end up erroring out. So i wrote a perl script to pull the information in a more direct method. Using opendir and having it count all the files works perfect. It can count them all in about 10 seconds. Now I try to just step my script up one more notch and have it actually move the files and I get the error "File too large". This must be some sort of false error as the files themselves are all fairly small.

#!/usr/bin/perl
#############################################
# CopyFilesLite
# Russell Perkins
# 7/12/2010
#
# Tool is used to copy millions of files
# while using as little memory as possible. 
#############################################

use strict;
use warnings;
use File::Copy;

#dir1, dir2 passed from command line
my $dir1 = shift;
my $dir2 = shift;
#Varibles to keep count of things
my $count = 0;
my $cnt_FileExsists = 0;
my $cnt_FileCopied = 0;

#simple error checking and validation
die "Usage: $0 directory1 directory2\n" unless defined $dir2;
die "Not a directory: $dir1\n" unless -d $dir1;
die "Not a directory: $dir2\n" unless -d $dir2;

opendir DIR, "$dir1" or die "Could not open $dir1: $!\n";
while (my $file = readdir DIR){
  if (-e $dir2 .'/' . $file){
   #print $file . " exsists in " . $dir2 . "\n"; #debuging 
   $cnt_FileExsists++;
  }else{
   copy($dir1 . '/' . $file,$dir2 . '/' . $file) or die "Copy failed: $!";
   $cnt_FileCopied++;
   #print $file . " does not exsists in " . $dir2 . "\n"; #debuging 
  }
  $count++;
}
closedir DIR;

#ToDo: Clean up output. 
print "Total files: $count\nFiles not copied: $cnt_FileExsists\nFiles Copied: $cnt_FileCopied\n\n";

So have any of you ran into this before? What would cause this and how can it be fixed?

A: 

I am not sure if this is related to your problem, but readdir will return a list of all directory contents, including subdirectories, if present, and the current (.) and parent directories (..) on many operating systems. You may be attempting to copy directories as well as files. The following will not attempt to copy any directories:

while (my $file = readdir DIR){
    next if -d "$dir1/$file";
toolic
It does see the . and .. but since they are in both locations it skips them. It only copies files that are in dir1 but not dir2
The Digital Ninja
Do you still get the copy error with my code?
toolic
+2  A: 

On your error handling code, could you please change or die "Copy failed: $!"; to 'or die "Copy failed: '$dir1/$file' to '$dir2/$file': $!";' ?

Then it should tell you where the error happens.

Then check 2 things -

1) Does it fail every time on the same file?

2) Is that file somehow special? Weird name? Unusual size? Not a regular file? Not a file at all (as the other answer theorized)?

DVK
A: 

Seems this was an issue either with my nfs mount of the server that it was mounted to. I hooked up a usb drive to it and the files are copying with extreme speed...if you count usb 2 as extreme.

The Digital Ninja
nfs has a hard limit to max files that can be in a directory.
The Digital Ninja
A: 

6.5 million images in one folder is very extreme and puts a load on the machine just to read a directory, whether it's in shell or Perl. That's one big folder structure.

I know you're chasing a solution in Perl now, but when dealing with that many files from the shell you'll want to take advantage of the xargs command. It can help a lot by grouping the files into manageable chunks. http://en.wikipedia.org/wiki/Xargs

Greg
A: 

Maybe the file system of partition you are send the data to do not support very large data.

Lucas Arbiza