As a maintenance issue I need to routinely (3-5 times per year) copy a repository that is now has over 20 million files and exceeds 1.5 terabytes in total disk space. I am currently using RICHCOPY, but have tried others. RICHCOPY seems the fastest but I do not believe I am getting close to the limits of the capabilities of my XP machine.
I am toying around with using what I have read in The Art of Assembly Language to write a program to copy my files. My other thought is to start learning how to multi-thread in Python to do the copies.
I am toying around with the idea of doing this in Assembly because it seems interesting, but while my time is not incredibly precious it is precious enough that I am trying to get a sense of whether or not I will see significant enough gains in copy speed. I am assuming that I would but I only started really learning to program 18 months and it is still more or less a hobby. Thus I may be missing some fundamental concept of what happens with interpreted languages.
Any observations or experiences would be appreciated. Note, I am not looking for any code. I have already written a basic copy program in Python 2.6 that is no slower than RICHCOPY. I am looking for some observations on which will give me more speed. Right now it takes me over 50 hours to make a copy from a disk to a Drobo and then back from the Drobo to a disk. I have a LogicCube for when I am simply duplicating a disk but sometimes I need to go from a disk to Drobo or the reverse. I am thinking that given that I can sector copy a 3/4 full 2 terabyte drive using the LogicCube in under seven hours I should be able to get close to that using Assembly, but I don't know enough to know if this is valid. (Yes, sometimes ignorance is bliss)
The reason I need to speed it up is I have had two or three cycles where something has happened during copy (fifty hours is a long time to expect the world to hold still) that has caused me to have to trash the copy and start over. For example, last week the water main broke under our building and shorted out the power.
Thanks for the early responses but I don't think it is I/O limitations. I am not going over a network, the drive is plugged into my mother board with a sata connection and my Drobo is plugged into a Firewire port, my thinking is that both connections should allow faster transfer.
Actually I can't use a sector copy except going from a single disk to the Drobo. It won't work the other way since the Drobo file structure is a mystery. My unscientific observation is that the copy from one internal disk to another is no faster than a copy to or from the Drobo to an internal disk.
I am bound by the hardware, I can't afford 10K rpm 2 terabyte drives (if they even make them).
A number of you are suggesting a file synching solution. But that does not solve my problem. First off, the file synching solutions I have played with build a map (for want of a better term) of the data first, I have too many little files so they choke. One of the reasons I use RICHCOPY is that it starts copying immediately, it does not use memory to build a map. Second, I had one of my three Drobo backups fail a couple of weeks ago. My rule is if I have a backup failure the other two have to stay off line until the new one is built. So I need to copy from one of the three back up single drive copies I have that I use with the LogicCube.
At the end of the day I have to have a good copy on a single drive because that is what I deliver to my clients. Because my clients have diverse systems I deliver to them on SATA drives.
I rent some cloud space from someone where my data is also stored as the deepest backup but it is expensive to pull if off of there.