views:

508

answers:

4

I have a faulty hard drive that works intermittently. After cold booting, I can access it for about 30-60 seconds, then the hard drive fails. I'm willing to write a software to backup this drive to a new and bigger disk. I can develop it under GNU/Linux or Windows, I don't care.

The problem is: I can only access the disk for some time, and there are some files that are big and will take longer than that to be copied. For this reason, I'm thinking of backing up the entire hard disk in smaller pieces, something like bit torrenting. I'll read some megabytes and store it, before trying to read another set. My main loop would be something like this:

while(1){
    if(!check_harddrive()){ sleep(100ms); continue; }
    read_some_megabytes();
    if(!check_harddrive()){ sleep(100ms); continue; }
    save_data();
    update_reading_pointer();
    if(all_done){ break; }
}

The problem is the check_harddrive() function. I'm willing to write this in C/C++ for maximus API/library compatibility. I'll need some control over my file handlers to check if they are still valid, and I need something to return bad data, but return, if the drive fails during the copy process.

Maybe C# would give me best results if I abuse "hardcoded" hardware exceptions?

Another approach would be measuring how much time would I need to power cycle my harddrive and code a program to read it during this time only, and flagging me when to power cycle.

What would you do in this case? Are there any tools/utilities that already do this?

Oh, there is a GREAT app to read bad optical medias here, it's called IsoPuzzle, it's not mine, I just wanted to share something related to my problem.

!EDIT!

Some clarifications. I'm a home user, a student of computer engineering at college, I'd rather lose the data than spend thousands of dollars recovering it. The harddrive is still covered by Seagate's warranty, but since they gave me 5 years of warranty, I wanna try everything possible until the time runs out.

When I say cold booting, I mean booting after some seconds without power. Hot booting would be rebooting your computer, cold booting would be shutting it down, waiting a few seconds then bootting it up again. Since the harddisk in question is internal but SATA, I can just disconnect the power cable, wait a few seconds and connect it again.

Until now I'll go with robocopy, I'm just searching for it to see how I can use it. If I don't need to code myself, but script, it'll be even easier.

!EDIT2!

I wasn't clear, my drive is a Seagate 7200.11. It's known that it has a bad firmware and it's not always fixable with a simple firmware update (not after this bug appears). The drive physically is 100% in working condition, just the firmware is screwed, making it enter on a infinite busy state after some seconds.

+5  A: 

I would work this from the hardware angle first. Is it an external drive - if so, can you try it in a different case?

You mention cold-booting works, then it quits. Is this heat related? Have you tried using the hard drive for an extended period in something like a freezer?

From the software side I'd have a second thread keep an eye on some progress counter updated by a repeated loop reading small amounts of data, then it would be able to signal failure via a timeout you would define.

Kendall Helmstetter Gelner
+1, if he literally means "cold booting", putting it in a fridge/freezer isn't that far-fetched. I once had a drive which worked in the opposite manner -- startup would bluescreen a couple times before the drive warmed itself up sufficiently.
GRB
+1 agreed, try to solve HW problem first.
Justicle
I've been told that putting the harddrive in the fridge in a bag for some hours would revive it for time enough for the backup process. But again, I don't think I have a hardware problem, but a firmware problem.
Spidey
+2  A: 

You might be interested in robocopy("Robust File Copy"). Robocopy is a command line tool and it can tolerate network outages and resume copying where it previously left off (incomplete files are noted with a date stamp corresponding to 1980-01-01 and contain a recovery record so Robocopy knows from where to continue).

You know... I like being "lazy"... Here is what I would do:

I would write 2 simple scripts. One of them would start robocopy (with persistance feautures turned off) and start the copying while the other would periodically check (maybe by trying to list the contents of the root directory and if it takes more than a few seconds than it it is dead... again..) whether the drive is still working and if the HDD stopped working it would restart the machine. Get them start up after login and setup up auto-login so when the machines reboots it automatically continues.

Kalmi
I like the robocopy approach, but rebooting the system like more than a thousand times doesn't seem a good idea. I'll stick with the mechanic version, [dis]connecting the sata power cable myself.
Spidey
+1  A: 

From a "I need to get my data back" perspective, if your data is really valuable to you, I would recommend sending the drive to a data recovery specialist. Depending on how valuable the data is, the cost (probably several hundred dollars) is trivial. Ideally, you would find a data recovery specialist that doesn't just run some software to do the recovery - if the software approach doesn't work, they should be able to do things like replace the circiut board on the drive, and probably other things (I am not a data recover specialist).

If the value of the data on the drive doesn't quite rise to that level, you should consider purchasing one of the many pieces of software for data recovery. For example, I personally have used and would recommend GetDataBack from Runtime software http://www.runtime.org. I've used it to recover a failing drive, it worked for me.

And now on to more general information... The standard process for data recovery off of a failing drive is to do as little as possible on the drive itself. You should unplug the drive, and stop attempting to do anything. The drive is failing, and it is likely to get worse and worse. You don't want to play around with it. You need to maximize your chances of getting the data off.

The way the process works is to use software that reads the drive block-by-block (not file-by-file), and makes an image copy of the drive. The software attempts to read every block, and will retry the reads if they fail, and writes an image file which is an image of the entire hard drive.

Once the hard drive has been imaged, the software then works against the image to identify the various logical parts of the drive - the partitions, directories, and files. And then it enables you to copy the files off of the image.

The software can typically "deduce" structures from the image. For example, if the partition table is damaged or missing, the software will scan through the entire image, looking for things that might be partitions, and if they look enough like partitions, it will treat them like a partition and see if it can find directories and files. So good software is written with using a lot of knowledge about the different structures on the drive.

If you want to learn how to write such software, good for you! My recommendation is that you start with books about how various operating systems organize data on hard drives, so that you can start to get an intuitive feel for how a software might work with drive images to pull data from them.

Don Dumitru
My problem doesn't stand in the OS layer, in the abstraction of the bits in the magnetic disk. The problem is the hardware, but no physically, just the firmware that is faulty.Some professional could easily recover that data, but I'll adventure a little myself before I try something more drastic.
Spidey
+2  A: 

I think the simplest way for you is to copy the entire disk image. Under Linux your disk will appear as a block device, /dev/sdb1 for example.

Start copying the disk image until the read error appear. Then wait for the user to "repair" the disk and start reading from the last position.

You can easily mount file disk image and read its content, see -o loop option for mount.

Cool down disk before use. I heard that helps.

Viliam
I can open /dev/sdb1 with a simple FILE handler in C? Then I control with some kind of variable a pointer to fseek and I continue reading until eof? Will a device file give my eof?What about mounting, I've read I can mount a partition using the loop device. Could I mount an entire device (such as /dev/sdb) with a loop device, and then mount the partitions with another loop device? When I mount the device image file, how could I access the partitions? I'm asking this just out of curiosity, I have just one partition in this disk anyways.
Spidey
Sure, you can open the whole partition and read it as a file. Since it is a block device, you can use ordinary functions including fseek and eof works as well. Afaik you cannot mount the whole disk with partitions, just concrete partition only.
Viliam