views:

1794

answers:

6

I would like to know what copy-on-write is and what it is used for? The term 'copy-on-write array' is mentioned several times in the Sun JDK tutorials but I didn't understand what it meant.

+5  A: 

I was going to write up my own explanation but this Wikipedia article pretty much sums it up.

Here is the basic concept:

Copy-on-write (sometimes referred to as "COW") is an optimization strategy used in computer programming. The fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, you can give them pointers to the same resource. This function can be maintained until a caller tries to modify its "copy" of the resource, at which point a true private copy is created to prevent the changes becoming visible to everyone else. All of this happens transparently to the callers. The primary advantage is that if a caller never makes any modifications, no private copy need ever be created.

Also here is an application of a common use of COW:

The COW concept is also used in maintenance of instant snapshot on database servers like Microsoft SQL Server 2005. Instant snapshots preserve a static view of a database by storing a pre-modification copy of data when underlaying data are updated. Instant snapshots are used for testing uses or moment-dependent reports and should not be used to replace backups.

Andrew Hare
what is it used for?
hhafez
anything a regular array is used for... however, in some situations, this type of strategy results in more optimized results.
Andrew Flanagan
+6  A: 

"Copy on write" means more or less what it sounds like: everyone has a single shared copy of the same data until it's written, and then a copy is made. Usually, copy-on-write is used to resolve concurrency sorts of problems. In ZFS, for example, data blocks on disk are allocated copy-on-write; as long as there are no changes, you keep the original blocks; a change changed only the affected blocks. This means the minimum number of new blocks are allocated.

These changes are also usually implemented to be transactional, ie, they have the ACID properties. This eliminates some concurrency issues, because then you're guaranteed that all updates are atomic.

Charlie Martin
+1  A: 

I shall not repeat the same answer on Copy-on-Write. I think Andrew and Charlie has already made it very clear. I will give you an example from OS world, just to mention how widely this concept is used.

We can use fork() to create a new process or vfork() to create a new process. vfork follows the concept of copy-on-write. For example, the child process created by vfork will share the data and code segment with the parent process. This speeds up the forking time. It is expected to use vfork if you are performing exec followed by vfork. So vfork will create the child process which will share data and code segment with its parent but when we call exec, it will load up the image of a new executable in the address space of the child process.

Shamik
A: 

It's also used in Ruby 'Enterprise Edition' as a neat way of saving memory.

Chris
A: 

Copy-on-Write is a bad thing in a multithreaded environment.

Optimizations That Aren't (In a Multithreaded World) by Herb Sutter

Piotr Dobrogost
Like the article from Herb Sutter mentioned, using atomic operations is a potential solution, if the platform supports it. Qt widely uses shallow copy (Qt's terminology for CoW) for most of its classes.
Ariya Hidayat
Sutter's article says that CoW can be bad if used to to optimize a data structure under the covers. It does not say that CoW is always bad in a threaded environment. That certainly is not true, look at Clojure or Concurrent Haskell, for example. They use CoW to make multithreading easier and more efficient.
Ville Laurikari
There is no excuse whatsoever to not implement CoW on any platform that has fast atomic operations. None. The cost of doing a linear copy is a whole lot more than doing a simple lock (even if it's just a good ol' OS mutex) and incrementing the reference counter. Explicit sharing is worse, as it is cumbersome and is error-prone. Developer time is far more valuable than application runtime. Qt does an excellent job of providing CoW for nearly all of it's classes (if not all) and that mean performance in excellent with little effort on my side.
iconiK
A: 

Just to provide another example, Mercurial uses copy-on-write to make cloning local repositories a really "cheap" operation.

The principle is the same as the other examples, except that you're talking about physical files instead of objects in memory. Initially, a clone is not a duplicate but a hard link to the original. As you change files in the clone, copies are written to represent the new version.

harpo