views:

101

answers:

3

Say my program attempts a read of a byte in a file on a ZFS filesystem. ZFS can locate a copy of the necessary block, but cannot locate any copy with a valid checksum (they're all corrupted, or the only disks present have corrupted copies). What does my program see, in terms of the return value from the read, and the byte it tried to read? And is there a way to influence the behavior (under Solaris, or any other ZFS-implementing OS), that is, force failure, or force success, with potentially corrupt data?

+1  A: 

How would returning anything but an EIO error from read() make sense outside a file system specific low level data rescue utility?

The low level data rescue utility would need to use an OS and FS specific API other than open/read/write/close to to access the file. The semantics it would need are fundamentally different from reading normal files, so it would need a specialized API.

ndim
Anything which provides its own error correction facilities might want a crack at whatever is left (par2), and some forms of data, like MPEG, can recover from limited errors without a significant impact on the end user experience. (A 1 bit error probably won't even be noticed, but if movie playback ends, you'll definitely notice.)
Jay Kominek
+1 for ndim's answer, it can't be anything else than `EIO`. So your question should probably be: how do I get the raw, uncorrected and corrupted data that was read but didn't make it to `read()`?
Wim
Potentially there are even multiple versions of this block, all wrong in different ways. I expect you'll have to dig deep into the filesystem code to dig these out...
Wim
+2  A: 

EIO is indeed the only answer with current ZFS implementations.

An open ZFS "bug" asks for some way to read corrupted data: http://bugs.opensolaris.org/bugdatabase/printableBug.do?bug%5Fid=6186106

I believe this is already doable using the undocumented but open source zdb utility. Have a look at http://www.cuddletech.com/blog/pivot/entry.php?id=980 for explanations about how to dump a file content using zdb -R option and "r" flag.

jlliagre
A: 

Solaris 10:

"""

Create a test pool

[root@tesalia z]# cd /tmp [root@tesalia tmp]# mkfile 100M zz [root@tesalia tmp]# zpool create prueba /tmp/zz

Fill the pool

[root@tesalia prueba]# dd if=/dev/zero of=/prueba/dummy_file dd: writing to `/prueba/dummy_file': No space left on device 129537+0 records in 129536+0 records out 66322432 bytes (66 MB) copied, 1.6093 s, 41.2 MB/s

Umount the pool

[root@tesalia /]# zpool export prueba

Corrupt the pool on purpose

[root@tesalia prueba]# dd if=/dev/urandom of=/tmp/zz seek=100000 count=1 conv=notrunc 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.0715209 s, 7.2 kB/s

Mount the pool again

zpool import -d /tmp prueba

Try to read the corrupted data

[root@tesalia tmp]# md5sum /prueba/dummy_file md5sum: /prueba/dummy_file: I/O error

Read the manual

[root@tesalia tmp]# man -s2 read [...] RETURN VALUES Upon successful completion, read() and readv() return a non-negative integer indicating the number of bytes actually read. Otherwise, the functions return -1 and set errno to indicate the error.

ERRORS The read(), readv(), and pread() functions will fail if: [...] EIO A physical I/O error has occurred, [...]

"""

You must export/import the test pool because, if not, the direct overwrite (pool corruption) will be missed because the file will be cached in OS memory, still.

And no, currently ZFS will refuse to give you corrupted data. As it should.

jcea