views:

508

answers:

2

In a Linux VM (Vmware workstation or similar), how can I simulate a failure on a previously working disc?

I have a situation happening in production where a disc fails (probably a controller, cable or firmware problem). Obviously this is not predictable or reproducible, I want to test my monitoring to ensure that it alerts correctly.

I'd ideally like to be able to simulate a situation where it fails writes but succeeds reads, as well as a complete failure, i.e. the scsi interface reports errors back to the kernel.

+2  A: 

There are several layers at which a disk error can be simulated. If you are testing a single user-space program, probably the simplest approach is to interpose the appropriate calls (e.g. write()) and have them sometimes return an error. The libfiu fault-injection library can do this using its fiu-run tool.

Another approach is to use a kernel driver that can pass through data to/from another device, but inject faults along the way. You can then mount the device and use it from any application as if it was a faulty disk. The fsdisk driver is an example of this.

There is also a fault injection infrastructure that has been merged in to the Linux kernel, although you will probably need to reconfigure your kernel to enable it. It is documented in Documentation/fault-injection/fault-injection.txt. This is useful for testing kernel code.

It is also possible to use SystemTap to inject faults at the kernel level. See The SCSI fault injection test and Kernel Fault injection using SystemTap.

mark4o
+1  A: 

A simple way to make a SCSI disk disappear with a 2.6 kernel is:

echo 1 > /sys/bus/scsi/devices/H:B:T:L/delete

(H:B:T:L is host, bus, target, LUN). To simulate the read-only case you'll have to use the fault injection methods that mark4o mentioned, though.

caf