views:

23

answers:

3

Hi,

I'm running Python 2.6.2 on XP. I have a large number of text files (100k+) spread across several folders that I would like to consolidate in a single folder on an external drive.

I've tried using shutil.copy() and shutil.copytree() and distutils.file_util.copy_file() to copy files from source to destination. None of these methods has successfully copied all files from a source folder, and each attempt has ended with IOError Errno 13 Permission Denied and I am unable to create a new destination file.

I have noticed that all the destination folders I've used, regardless of the source folders used, have ended up with exactly 13,106 files. I cannot open any new files for writing in folders that have this many (or more files), which may be why I'm getting Errno 13.

I'd be grateful for suggestions on whether and why this problem is occurring.

many thanks, nick

+2  A: 

Are you using FAT32? The maximum number of directory entries in a FAT32 folder is is 65.534. If a filename is longer than 8.3, it will take more than one directory entry. If you are conking out at 13,106, this indicates that each filename is long enough to require five directory entries.

Solution: Use an NTFS volume; it does not have per-folder limits and supports long filenames natively (that is, instead of using multiple 8.3 entries). The total number of files on an NTFS volume is limited to around 4.3 billion, but they can be put in folders in any combination.

kindall
@kindall: adding the link: http://technet.microsoft.com/en-us/library/bb457112.aspx
pyfunc
Thank you for the very helpful answer and link. When I get another external drive or have the occasion to reformat my current one, I'll remember to use NTFS. Until then I think I'll need to introduce a directory structure to divide and conquer the copying.
nswitanek
A: 

I wouldn't have that many files in a single folder, it is a maintenance nightmare. BUT if you need to, don't do this on FAT: you have max. 64k files in a FAT folder.

Read the error message

Your specific problem could also be be, that you as the error message suggests are hitting a file which you can't access. And there's no reason to believe that the count of files until this happens should change. It is a computer after all, and you are repeating the same operation.

knitti
Thanks for the helpful advice. I too thought I might just be running into the same file at step 13106, but I maxed out at the same number of files when copying several different directories, so I think the issue was having long file names, as suggested in the above responses.
nswitanek
OK, then. Divide and conquer :-)
knitti
A: 

I predict that your external drive is formatted 32 and that the filenames you're writing to it are somewhere around 45 characters long.

FAT32 can only have 65536 directory entries in a directory. Long file names use multiple directory entries each. And "." always takes up one entry. That you are able to write 65536/5 - 1 = 13106 entries strongly suggests that your filenames take up 5 entries each and that you have a FAT32 filesystem. This is because there exists code using 16-bit numbers as directory entry offsets.

Additionally, you do not want to search through multi-1000 entry directories in FAT -- the search is linear. I.e. fopen(some_file) will induce the OS to march linearly through the list of files, from the beginning every time, until it finds some_file or marches off the end of the list.

Short answer: Directories are a good thing.

Eric Towers
Illuminating answer, thank you. Yes, filenames are either 40 or 41 characters long including file extension suffix, and I think the external drive is indeed formatted FAT32. I had thought it'd be easier to avoid a directory structure, but evidently there are important tradeoffs I wasn't aware of. Thanks again.
nswitanek