tags:

views:

233

answers:

1

I've run into the need to be able refer to a directory by path given its file descriptor in Linux. The path doesn't have to be canonical, it just has to be functional so that I can pass it to other functions. So, taking the same parameters as passed to a function like fstatat(), I need to be able to call a function like getxattr() which doesn't have a f-XYZ-at() variant.

So far I've come up with these solutions; though none are particularly elegant.

The simplest solution is to avoid the problem by calling openat() and then using a function like fgetxattr(). This works, but not in every situation. So another method is needed to fill the gaps.

The next solution involves looking up the information in proc:

if (!access("/proc/self/fd",X_OK)) {
    sprintf(path,"/proc/self/fd/%i/",fd);
}

This, of course, totally breaks on systems without proc, including some chroot environments.

The last option, a more portable but potentially-race-condition-prone solution, looks like this:

DIR* save = opendir(".");
fchdir(fd);
getcwd(path,PATH_MAX);
fchdir(dirfd(save));
closedir(save);

The obvious problem here is that in a multithreaded app, changing the working directory around could have side effects.

However, the fact that it works is compelling: if I can get the path of a directory by calling fchdir() followed by getcwd(), why shouldn't I be able to just get the information directly: fgetcwd() or something. Clearly the kernel is tracking the necessary information.

So how do I get to it?


Answer

The way Linux implements getcwd in the kernel is this: it starts at the directory entry in question and prepends the name of the parent of that directory to the path string, and repeats that process until it reaches the root. This same mechanism can be easily implemented in user-space.

Thanks to Jonathan Leffler for pointing this algorithm out. Here is a link to the kernel implementation of this function: http://lxr.linux.no/linux+v2.6.33/fs/dcache.c#L1905

+1  A: 

The kernel thinks of directories differently from the way you do - it thinks in terms of inode numbers. It keeps a record of the inode number (and device number) for the directory, and that is all it needs as the current directory. The fact that you sometimes specify a name to it means it goes and tracks down the inode number corresponding to that name, but it preserves only the inode number because that's all it needs.

So, you will have to code a suitable function. You can open a directory directly with open() precisely to get a file descriptor that can be used by fchdir(); you can't do anything else with it on many modern systems. You can also fail to open the current directory; you should be testing that result. The circumstances where this happens are rare, but not non-existent. (A SUID program might chdir() to a directory that the SUID privileges permit, but then drop the SUID privileges leaving the process unable to read the directory; the getcwd() call will fail in such circumstances too - so you must error check that, too!) Also, if a directory is removed while your (possibly long-running) process has it open, then a subsequent getcwd() will fail.

Always check results from system calls; there are usually circumstances where they can fail, even though it is dreadfully inconvenient of them to do so. There are exceptions - getpid() is the canonical example - but they are few and far between. (OK: not all that far between - getppid() is another example, and it is pretty darn close to getpid() in the manual; and getuid() and relatives are also not far off in the manual.)

Multi-threaded applications are a problem; using chdir() is not a good idea in those. You might have to fork() and have the child evaluate the directory name, and then somehow communicate that back to the parent.


bignose asks:

This is interesting, but seems to go against the querent's reported experience: that getcwd knows how to get the path from the fd. That indicates that the system knows how to go from fd to path in at least some situations; can you edit your answer to address this?

For this, it helps to understand how - or at least one mechanism by which - the getcwd() function can be written. Ignoring the issue of 'no permission', the basic mechanism by which it works is:

  • Use stat on the root directory '/' (so you know when to stop going upwards).
  • Use stat onthe current directory '.' (so you know where you are); this gives you a current inode.
  • Until you reach the root directory:
  • Scan the parent directory '..' until you find the entry with the same inode as the current inode; this gives you the next component name of the directory path.
  • And then change the current inode to the inode of '.' in the parent directory.
  • When you reach root, you can build the path.

Here is an implementation of that algorithm. It is old code (originally 1986; the last non-cosmetic changes were in 1998) and doesn't make use of fchdir() as it should. It also works horribly if you have NFS automounted file systems to traverse - which is why I don't use it any more. However, this is roughly equivlaent to the basic scheme used by getcwd(). (Ooh; I see a 18 character string ("../123456789.abcd") - well, back when it was written, the machines I worked on only had the very old 14-character only filenames - not the modern flex names. Like I said, it is old code! I haven't seen one of those file systems in what, 15 years or so - maybe longer. There is also some code to mess with longer names. Be cautious using this.)


/*
@(#)File:           $RCSfile: getpwd.c,v $
@(#)Version:        $Revision: 2.5 $
@(#)Last changed:   $Date: 2008/02/11 08:44:50 $
@(#)Purpose:        Evaluate present working directory
@(#)Author:         J Leffler
@(#)Copyright:      (C) JLSS 1987-91,1997-98,2005,2008
@(#)Product:        :PRODUCT:
*/

/*TABSTOP=4*/

#define _POSIX_SOURCE 1

#include "getpwd.h"

#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>

#if defined(_POSIX_SOURCE) || defined(USG_DIRENT)
#include "dirent.h"
#elif defined(BSD_DIRENT)
#include <sys/dir.h>
#define dirent direct
#else
What type of directory handling do you have?
#endif

#define PUBLIC
#define PRIVATE     static
#define RECURSIVE
#define READONLY    0
#define DIRSIZ      256

typedef struct stat   Stat;

static Stat root;

#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_getpwd_c[] = "@(#)$Id: getpwd.c,v 2.5 2008/02/11 08:44:50 jleffler Exp $";
#endif /* lint */

/* -- Routine: inode_number */

static ino_t   inode_number(char *path, char *name)
{
    ino_t           inode;
    Stat            st;
    char            buff[DIRSIZ + 6];

    strcpy(buff, path);
    strcat(buff, "/");
    strcat(buff, name);
    if (stat(buff, &st))
        inode = 0;
    else
        inode = st.st_ino;
    return(inode);
}

/*
    -- Routine: finddir
    Purpose:    Find name of present working directory

    Given:
        In:  Inode of current directory
        In:  Device for current directory
        Out: pathname of current directory
        In:  Length of buffer for pathname

    Maintenance Log
    ---------------
    10/11/86  JL    Original version stabilised
    25/09/88  JL    Rewritten to use opendir/readdir/closedir
    25/09/90  JL    Modified to pay attention to length
    10/11/98  JL    Convert to prototypes

*/
static int finddir(ino_t inode, dev_t device, char *path, size_t plen)
{
    register char  *src;
    register char  *dst;
    char           *end;
    DIR            *dp;
    struct dirent  *d_entry;
    Stat            dotdot;
    Stat            file;
    ino_t           d_inode;
    int             status;
    static char     name[] = "../123456789.abcd";
    char            d_name[DIRSIZ + 1];

    if (stat("..", &dotdot) || (dp = opendir("..")) == 0)
        return(-1);
    /* Skip over "." and ".." */
    if ((d_entry = readdir(dp)) == 0 ||
        (d_entry = readdir(dp)) == 0)
    {
        /* Should never happen  */
        closedir(dp);
        return(-1);
    }

    status = 1;
    while (status)
    {
        if ((d_entry = readdir(dp)) == 0)
        {
            /* Got to end of directory without finding what we wanted */
            /* Probably a corrupt file system */
            closedir(dp);
            return(-1);
        }
        else if ((d_inode = inode_number("..", d_entry->d_name)) != 0 &&
                 (dotdot.st_dev != device))
        {
            /* Mounted file system */
            dst = &name[3];
            src = d_entry->d_name;
            while ((*dst++ = *src++) != '\0')
                ;
            if (stat(name, &file))
            {
                /* Can't stat this file */
                continue;
            }
            status = (file.st_ino != inode || file.st_dev != device);
        }
        else
        {
            /* Ordinary directory hierarchy */
            status = (d_inode != inode);
        }
    }
    strncpy(d_name, d_entry->d_name, DIRSIZ);
    closedir(dp);

    /**
    ** NB: we have closed the directory we are reading before we move out of it.
    ** This means that we should only be using one extra file descriptor.
    ** It also means that the space d_entry points to is now invalid.
    */
    src = d_name;
    dst = path;
    end = path + plen;
    if (dotdot.st_ino == root.st_ino && dotdot.st_dev == root.st_dev)
    {
        /* Found root */
        status = 0;
        if (dst < end)
            *dst++ = '/';
        while (dst < end && (*dst++ = *src++) != '\0')
            ;
    }
    else if (chdir(".."))
        status = -1;
    else
    {
        /* RECURSE */
        status = finddir(dotdot.st_ino, dotdot.st_dev, path, plen);
        (void)chdir(d_name);    /* We've been here before */
        if (status == 0)
        {
            while (*dst)
                dst++;
            if (dst < end)
                *dst++ = '/';
            while (dst < end && (*dst++ = *src++) != '\0')
                ;
        }
    }

    if (dst >= end)
        status = -1;
    return(status);
}

/*
    -- Routine: getpwd

    Purpose:    Evaluate name of current directory

    Maintenance Log
    ---------------
    10/11/86  JL    Original version stabilised
    25/09/88  JL    Short circuit if pwd = /
    25/09/90  JL    Revise interface; check length
    10/11/98  JL    Convert to prototypes

    Known Bugs
    ----------
    1.  Uses chdir() and could possibly get lost in some other directory
    2.  Can be very slow on NFS with automounts enabled.

*/
char    *getpwd(char *pwd, size_t plen)
{
    int             status;
    Stat            here;

    if (pwd == 0)
        pwd = malloc(plen);
    if (pwd == 0)
        return (pwd);

    if (stat("/", &root) || stat(".", &here))
        status = -1;
    else if (root.st_ino == here.st_ino && root.st_dev == here.st_dev)
    {
        strcpy(pwd, "/");
        status = 0;
    }
    else
        status = finddir(here.st_ino, here.st_dev, pwd, plen);
    if (status != 0)
        pwd = 0;
    return (pwd);
}

#ifdef TEST

#include <stdio.h>

/*
    -- Routine: main
    Purpose:    Test getpwd()

    Maintenance Log
    ---------------
    10/11/86  JL    Original version stabilised
    25/09/90  JL    Modified interface; use GETCWD to check result

*/
int main(void)
{
    char            pwd[512];
    int             pwd_len;

    if (getpwd(pwd, sizeof(pwd)) == 0)
        printf("GETPWD failed to evaluate pwd\n");
    else
        printf("GETPWD: %s\n", pwd);
    if (getcwd(pwd, sizeof(pwd)) == 0)
        printf("GETCWD failed to evaluate pwd\n");
    else
        printf("GETCWD: %s\n", pwd);
    pwd_len = strlen(pwd);
    if (getpwd(pwd, pwd_len - 1) == 0)
        printf("GETPWD failed to evaluate pwd (buffer is 1 char short)\n");
    else
        printf("GETPWD: %s (but should have failed!!!)\n", pwd);
    return(0);
}

#endif /* TEST */
Jonathan Leffler
This is interesting, but seems to go against the querent's reported experience: that `getcwd` knows how to get the path from the fd. That indicates that the system knows how to go from fd to path in at least some situations; can you edit your answer to address this?
bignose
This is very interesting and got me thinking. I went and had a look through the kernel sources, and this is **exactly** how the kernel does a getcwd lookup. So you're spot on. I traced it to a function called `__d_path` in fs/dcache.c. Have a look for yourself: http://lxr.linux.no/#linux+v2.6.33/fs/dcache.c#L1905
tylerl