views:

2013

answers:

10

Suppose I want to clone my hard drive (hda) to another drive (hdb) in the same computer. As I see it, there's two easy, rough and do-it-yourself ways:

cat /dev/hda > /dev/hdb

and

dd if=/dev/hda of=/dev/hdb

What technical reasons are there to that the latter is often/always said to be better than the former?

under no circumstances try these commands at home, or your UNIX is fubar'd

+5  A: 

I think there might be a performance penalty for sending all your data through the pipe. dd does it all in one program, and is probably optimized for block reads and writes.

Neall
+1...but maybe to be clear...only one program is involved dd, where as with cat the shell is involved and data is passed though a ipc pipe
epatel
Don't be too sure about that.http://prstat.blogspot.com/2008/04/why-cat1-ran-faster-than-dd1m.html
olle
interesting...but, they are counting syscalls...and, I don't see them counting the calls the shell is doing, as it is the shell doing the writing to disk.
epatel
A more important reason is probably: "cat uses a 8MB write vs. the 8k writes we specified for dd" - its not surprising that you'd get lots more syscalls if you write in blocks a thousand times smaller!
Brian
Brian: They timed cat vs dd using the same 8MB size. "To equal the playing ground, we ptime(1) both at 8MB IO." Cat was still faster, and if you keep reading that article, you'll find out why.
indiv
@epatel: the shell sets up the output, but doesn't do the writing.
wnoise
+2  A: 

The question should probably be "Is dd better than cat?". The answer is probably going to depend on a lot of other factors. The only real way to find out would be actual tests.

PiedPiper
rsync wouldn't clone the partition table
Henrik Paul
For the purpose of this question, it's fair to assume that they are identical in every aspect.
Henrik Paul
It would make sense to make that clear in your question
PiedPiper
+2  A: 

If I remember correctly, dd is much more "low level" in is approach, skipping such fancy things as filesystems and all the bells and whistles :)

One time, dd literally saved my precious, ehm, life:

stucked with a linux box full of "ABSOLUTELY IRREPLACEABLE DATA" from one contractor, with waaaay to many bad sectors that the only thing usable was the emergency linux shell. Did I mention the impossibility to open up the box to get the drive? Ah, yes, no usb and the such...

And then, the light!!! dd and ftp working!!! Ahhh, refreshening! Saving your day (and career) with a neat command line incantation to make a backup dump on a remote disk with ftp...

Something like put |dd if=/dev/hda bs=4096 count= ???? or similar...

cat didnt' work for me then...

Myrrdyn
A: 

dd should be faster because it does the I/O internally, as opposed to cat, which uses stdio and the shell for I/O.

A: 

There's little practical difference anymore.
Internally they both optomise to do use pretty much the same cals.

dd has a number of useful extra features for more complex data copies and might be recognised as more 'normal' for this sort of task by other users. But simple 'cat' is in the essence of unix,

Martin Beckett
+15  A: 

The big difference between just plain 'cat' and 'dd' is that dd insures that all reads and writes are done with the size you specify. This can be significant, depending on what device (and version of *nix) you're using.

nsayer
Very significant, if you are talking to a tape. Less significant with modern devices. But you still sometimes want the detailed control that dd can give you.
Darron
A: 

Interesting... I've seen recommendations in the past to do this sort of copy operation with various permutations of dd, cpio, or tar, but never cat.

I keep wanting to say that it's a matter of cat being text-oriented and the others being binary-data-oriented or that dd can deal more readily with unformatted space, but *nix doesn't really make a text/binary distinction and copying the device file will take the filesystem with the copy, so formatting shouldn't be an issue.

dd does have a lot of additional options for data conversion in the process of transferring it, but I wouldn't expect that to be particularly relevant when transferring the filesystem itself rather than just its contents.

Dave Sherohman
+7  A: 

A few points.

ltrace cat </dev/zero >/dev/null suggests that cat is more efficient by default as it doesn't memcpy and more importantly uses 4KiB buffers by default.

ltrace dd if=/dev/zero of=/dev/null shows that dd defaults to using 512B buffers which is very inefficient for reading modern disks (though the kernel should alleviate this somewhat with various disk scheduling). However dd is much more configurable than cat and you can use something like bs=2M to reduce the number of syscalls

dd is problematic in the presence of disk errors, and can hang or more importantly ignore non readable data thus corrupting the destination disk. Consider dd_rescue or ddrescue for this task instead.

pixelbeat
+5  A: 

It's really historical, and not that important if all you're going to do is clone one disk to another.

On traditional Unix, disks were accessible through both block devices and character devices, and each had different requirements. 'dd' was needed to interact with the block devices, because 'cat' only knew about character I/O. (Block I/O was particularly important for dealing efficiently with things like tape drives.)

dd can be handy if you need to re-start a long-running copy, because of its skip and seek options.

Unless you're on a system that still has the block/character distinction for disk access (which Linux doesn't), and unless you need to do something like swap bytes, 'cat' will be fine (and probably faster, because it'll default to huger block sizes than dd).

Note that, unless someone's done some major tinkering in shell design since last I looked, 'cat foo >bar' does not do the writing to 'bar' via the shell; all the shell does is open 'bar' for writing with truncation, then pass the open file descriptor to 'cat' across a fork/exec as file descriptor 1 (stdout). At that point, the shell is out of the loop, and doesn't get involved again, beyond being notified of the exit status of 'cat'.

fcw
A: 

You want to use dd so that you can specify things like bsize, which is how much to read/write at once; tuning this to some multiple of 4k is going to be much faster than cat, which is, I think, going to be limited by the pipes involved.

pjz