views:

66

answers:

1

A directory exists with a total of 2,153,425 items (according to Windows folder Properties). It contains .jpg and .gif image files located within a few subdirectories. The task was to move the images into a different location while querying each file's name to retrieve some relevant info and store it elsewhere.

The script that used File::Find finished at 20462 files. Out of curiosity I wrote a tiny recursive function to count the items which returned a count of 1,734,802. I suppose the difference can be accounted for by the fact that it didn't count folders, only files that passed the -f test.

The problem itself can be solved differently by querying for file names first instead of traversing the directory. I'm just wondering what could've caused File::Find to finish at a small fraction of all files.

The data is stored on an NTFS file system.

Here is the meat of the script; I don't think including DBI stuff would be relevant since I reran the script with nothing but a counter in process_img() which returned the same number.

find(\&process_img, $path_from);

sub process_img {
    eval {
        return if ($_ eq "." or $_ eq "..");

        ## Omitted querying and composing new paths for brevity.

        make_path("$path_to\\img\\$dir_area\\$dir_address\\$type");
        copy($File::Find::name, "$path_to\\img\\$dir_area\\$dir_address\\$type\\$new_name");
    };
    if ($@) { print STDERR "eval barks: $@\n"; return }
}

EDIT:

eval barked a few times regarding BDI errors:

DBD::CSV::db do failed: Mismatched single quote before:
'INSERT INTO img_info (path, type, floorplan, legacy_id)
        VALUES (
            ?0?building?1?0?2?19867'
        )' at C:/perl/site/lib/SQL/Statement.pm line 77
[for Statement "
INSERT INTO img_info (path, type, floorplan, legacy_id)
        VALUES (
            'wal/15 Broad Street/building/nyc-Wall-St--South-St--Seaport-condo-elevator- building-52201501.jpg',
            'building',
            '0',
            '19867'
        )
"]

I assume that's due to the double dash between 'St' and 'South'. No errors of other nature were reported.

And here is another method I used to count files:

count_images($path_from);
sub count_images {
    my $path = shift;

    opendir my $images, $path or die "died opening $path";
    while (my $item = readdir $images) {
        next if $item eq '.' or $item eq '..';
        $img_counter++ && next if -f "$path/$item";
        count_images("$path/$item") if -d "$path/$item";
    }
    closedir $images or die "died closing $path";
}

print $img_counter;
+2  A: 

It could have run out of resources? (memory, file descriptors, etc...?).

or it could have been some funky file name (easy to test by re-running again but removing 10 files - if it stops on exact same file, that filename is the culprit)

If you can trace the memory footprint, that'd tell you if you have a memory leak (see recent SO question on memory leaks to help with that).

And as Ether said, we could offer hopefully more than general investigative idea had you pasted in the code.

UPDATE

Based one your code:

  1. Please indicate whether eval barks anything to STDERR

  2. More importantly, any IO operations need to be error-checked. E.g.

    copy($something,$other)
        || die "Copy $something to $other died with error: $!\n"; # or print
    # Same for making the directory
    
DVK