tags:

views:

563

answers:

7

I'm looking for the Unix equivalent of Win32's CopyFile, I don't want to reinvent the wheel by writing my own version.

+1  A: 

One option is that you could use system() to execute cp. This just re-uses the cp(1) command to do the work. If you only need to make another link the the file, this can be done with link() or symlink().

ConcernedOfTunbridgeWells
beware that system() is a security hole.
plinth
You said "if", but link won't work across file systems, FYI.
Roboprog
Really? Would you use this in production code? I can't think of a good reason not to but it doesn't strike me as a _clean_ solution.
Motti
If you specify the path to /bin/cp you're relatively safe, unless the attacker has managed to compromise the system to the extent that they can make modifications to arbitrary system shell utilities in /bin. If they've compromised the system to that extent you've got far bigger problems.
ConcernedOfTunbridgeWells
Using system to run commands is fairly common in unix-land. With proper hygiene it can be reasonably secure and robust. After all, the commands are designed to be used in this way.
ConcernedOfTunbridgeWells
@Roboprog - true; if you need to cross filesystems you would need symlink().
ConcernedOfTunbridgeWells
What will happen if the user creates a file name like "somefile;rm /bin/*"? system() executes the command with sh -c so the text of the entire string is executed by the shell, which means you'd get anything after a semicolon executed as a command - stinks if your code is running setuid too. This is not unlike Bobby Tables (http://xkcd.com/327/). For the trouble it would take to fully sanitize system() you could instead do the fork/exec pair directly on /bin/cp with the correct arguments.
plinth
plinth: I agree that using `system()` in this way is generally a bad idea, but note that `/` is one of the only two characters *not* allowed in a UNIX filename.
caf
Alas, in sanitizing for a system call, 'taint perl :-(
Roboprog
+1  A: 

this is tagged as C, if you're in c++, but happened to mis-tag this question there is: http://stackoverflow.com/questions/829468/how-to-perform-boostfilesystem-copyfile-with-overwrite

in C, maybe there is something in GLib.

Chris H
+1  A: 
sprintf( cmd, "/bin/cp -p \'%s\' \'%s\'", old, new);

system( cmd);

Add some error checks...

Otherwise, open both and loop on read/write, but probably not what you want.

Roboprog
Dang, I've got to learn to "submit" faster :-)
Roboprog
This does not work for files that have spaces (or quotes, backslashes, dollar signs, etc.) in the name. I use spaces in file names fairly often.
Dietrich Epp
Ouch. That's right. Add backslash-single-quotes around the file names in the sprintf().
Roboprog
OK, this is a swiss cheese (see valid security concerns in comments elsewhere), but if you have a relatively controlled environment, it might have some use.
Roboprog
You have a shell code injection vulnerability if you do not properly handle single quote characters in the values of `old` or `new`. A little more effort to use fork and do your own exec can avoid all these problems with quoting.
Chris Johnsen
Yep, simple obvious and wrong, in many cases. Which is why I up-voted some of the more elaborate examples.
Roboprog
+4  A: 

There is no baked-in equivalent CopyFile function in the APIs. But sendfile can be used to copy a file in kernel mode which is a faster and better solution (for numerous reasons) than opening a file, looping over it to read into a buffer, and writing the output to another file.

Here's some code I grabbed from a project I'm working on:

#include <sys/socket.h>
#include <fcntl.h>

int inline BLCopyFile(const char* source, const char* destination)
{
    //Here we use kernel-space copying for performance reasons

    int input, output;

    if( (input = open(source, O_RDONLY)) == -1)
        return 0;

    if( (output = open(destination, O_WRONLY | O_CREAT)) == -1)
    {
        close(input);
        return 0;
    }

    off_t bytesCopied;

    int result = sendfile(output, input, 0, &bytesCopied, 0, 0) == -1;

    close(input);
    close(output);

    return result;
}
Computer Guru
According to the man page, the output argument of `sendfile` must be a socket. Are you sure this works?
Jay Conrod
The prototype from my man page (OS X):`int sendfile(int fd, int s, off_t offset, off_t *len, struct sf_hdtr *hdtr, int flags);`The output param is fd - file descriptor.At any rate, I tested it quickly (hence the updated non-C++ version) and it worked :)
Computer Guru
For Linux, Jay Conrod is right - the `out_fd` of `sendfile` could be a regular file in 2.4 kernels, but it now must support the `sendpage` internal kernel API (which essentially means pipe or socket). `sendpage` is implemented differently on different UNIXes - there's no standard semantics for it.
caf
@Computer Guru: The prototype under Linux is different to OSX, hence you would think that (and I thought that too) that when I saw your implementation and saw the extra parameters for the sendfile...it is platform dependant - something worth bearing in mind about!
tommieb75
fyi - you can save a lot of work with a if (PathsMatch(source, destination)) return 1; /* where PathsMatch is the appropriate path comparison routine for the locale */, otherwise I imagine that the second open would fail.
plinth
+3  A: 
tommieb75
I am not 100% sure about the sendfile prototype, I think I got one of the parameters wrong... please bear that in mind... :)
tommieb75
+1, good one (reusable routine and all)
Roboprog
You have a race condition - the file you have open as `fdSource` and the file you have `stat()ed` are not necessarily the same.
caf
@caf: Can you give more details as I am looking at it and how can there be a race condition? I will amend the answer accordingly..thanks for letting me know...
tommieb75
tommbieb75: Simple - in between the `open()` call and the `stat()` call, someone else could have renamed the file and put a different file under that name - so you will copy the data from the first file, but using the length of the second one.
caf
@caf: Holy moly....why didn't I think of that...well spotted...a lock should do the trick on the source file...well done for spotting that...race condition..well I never...as Clint Eastwood in 'Gran Torino' says 'J.C all friday...'
tommieb75
A lock doesn't help (they're not mandatory), but `fstat` can be used in this case to fix it.
caf
@caf: Damnnit..... I just saw your comment after I edited my answer in the code.... dang.... LOL!!!!
tommieb75
@Caf: Feel free to edit the code if you wish! :)
tommieb75
+3  A: 

It's straight forward to use fork/execl to run cp to do the work for you. This has advantages over system in that it is not prone to a Bobby Tables attack and you don't need to sanitize the arguments to the same degree. Further, since system() requires you to cobble together the command argument, you are not likely to have a buffer overflow issue due to sloppy sprintf() checking.

The advantage to calling cp directly instead of writing it is not having to worry about elements of the target path existing in the destination. Doing that in roll-you-own code is error-prone and tedious.

I wrote this example in ANSI C and only stubbed out the barest error handling, other than that it's straight forward code.

void copy(char *source, char *dest)
{
    int childExitStatus;
    pid_t pid;

    if (!source || !dest) {
        /* handle as you wish */
    }

    pid = fork();

    if (pid == 0) { /* child */
        execl("/bin/cp", "/bin/cp", source, dest, (char *)0);
    }
    else if (pid < 0) {
        /* error - couldn't start process - you decide how to handle */
    }
    else {
        /* parent - wait for child - this has all error handling, you
         * could just call wait() as long as you are only expecting to
         * have one child process at a time.
         */
        pid_t ws = waitpid( pid, &childExitStatus, WNOHANG);
        if (ws == -1)
        { /* error - handle as you wish */
        }

        if( WIFEXITED(childExitStatus)) /* exit code in childExitStatus */
        {
            int status = WEXITSTATUS(childExitStatus); /* zero is normal exit */
            /* handle non-zero as you wish */
        }
        else if (WIFSIGNALED(status)) /* killed */
        {
        }
        else if (WIFSTOPPED(status)) /* stopped */
        {
        }
    }
}
plinth
+1 for another long, detailed, slog. Really makes you appreciate the "vector"/list form of system() in perl. Hmm. Maybe a system-ish function with an argv array would be nice to have?!?
Roboprog
+3  A: 

There is no need to either call non-portable APIs like sendfile, or shell out to external utilities. The same method that worked back in the 70s still works now:

#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

int cp(const char *to, const char *from)
{
    int fd_to, fd_from;
    char buf[4096];
    ssize_t nread;
    int saved_errno;

    fd_from = open(from, O_RDONLY);
    if (fd_from < 0)
        return -1;

    fd_to = open(to, O_WRONLY | O_CREAT | O_EXCL, 0666);
    if (fd_to < 0)
        goto out_error;

    while (nread = read(fd_from, buf, sizeof buf), nread > 0)
    {
        char *out_ptr = buf;
        ssize_t nwritten;

        do {
            nwritten = write(fd_to, out_ptr, nread);

            if (nwritten >= 0)
            {
                nread -= nwritten;
                out_ptr += nwritten;
            }
            else if (errno != EINTR)
            {
                goto out_error;
            }
        } while (nread > 0);
    }

    if (nread == 0)
    {
        if (close(fd_to) < 0)
        {
            fd_to = -1;
            goto out_error;
        }
        close(fd_from);

        /* Success! */
        return 0;
    }

  out_error:
    saved_errno = errno;

    close(fd_from);
    if (fd_to >= 0)
        close(fd_to);

    errno = saved_errno;
    return -1;
}
caf
@Caf: OMG....g.o.t.o..... :) Your code is more saner than mine anyways... ;) The old loop with read/write is the most portable... +1 from me...
tommieb75
I find controlled use of `goto` can be useful to consolidate the error handling path in one place.
caf