tags:

views:

133

answers:

2

In the 2nd edition of "The C Programming Language" by Kernighan and Ritchie they implement a simplified version of the UNIX command ls (section 8.6 "Example - Listing Directories", p. 179). For this purpose they create the following interface which provides a system-independent access to the name and inode number of the files stored in a directory.

#define NAME_MAX 14   /* longest filename component; */
                              /* system dependent */

typedef struct {      /* portable director-entry */
    long ino;                 /* inode number */
    char name[NAME_MAX+1];    /* name + '\0' terminator */
} Dirent;

typedef struct {      /* minimal DIR: no buffering, etc. */
    int fd;                   /* file descriptor for directory */
    Dirent d;                 /* the directory entry */
} DIR;

DIR *opendir(char *dirname);
Dirent *readdir(DIR *dfd);
void closedir(DIR *dfd);

Then they implement this interface for Version 7 and System V UNIX systems.

  • opendir() basically uses the system call open() to open a directory and malloc() to allocate space for a DIR structure. The file descriptor returned by open() is then stored in the variable fd of that DIR. Nothing is stored in the Dirent component.

  • readdir() uses the system call read() to get the next (system-dependent) directory entry of an opened directory and copies the so obtained inode number and filename into a static Dirent structure (to which a pointer is returned). The only information needed by readdir() is the file descriptor stored in the DIR structure.

Now to my question: What is the point of having a DIR structure? If my understanding of this program is correct, the Dirent component of DIR is never used, so why not replace the whole structure with a file descriptor and directly use open() and close()?

Thanks.

Ps: I am aware that on modern UNIX systems read() can no longer be used on directories (I have tried out this program on Ubuntu 10.04), but I still want to make sure that I have not overlooked something important in this example.

+3  A: 

From K&R:

Regrettably, the format and precise contents of a directory are not the same on all versions of the system. So we will divide the task into two pieces to try to isolate the non-portable parts. The outer level defines a structure called a Dirent and three routines opendir, readdir, and closedir to provide system-independent access to the name and inode number in a directory entry.

So the reason is portability. They want to define an interface that can survive on systems that have different stat structs or nonstandard open() and close(). They go on to build a bunch of reusable tools around it, which don't even care if they're on a Unix-like system. That's the point of wrappers.

Maybe it's not used because they started out by defining their data structures (with a Dirent inside DIR) but ended up not using it. Keeping data structures grouped like that is good design.

Nathon
Portability was also my first guess but after thinking it over I don't see how `DIR` could contribute to this. The only relevant information it can pass to `readdir()` is the file descriptor.I still don't see the use of the `Dirent` component in `DIR`. Regardless of the system, any implementation of `readdir()` can have a static `Dirent` to which it can return a pointer, so this should not be a portability issue.It is true that `dirwalk()` accesses the contents of a `Dirent`, but this is the static one from `readdir()`, not the one contained in `DIR`.Am I missing something?
qfab
Hey, it looks like you're right. My guess is that they started out by defining their data structures (with a `Dirent` inside `DIR`) but ended up not using it. Grouping related data together in structs is good juju. A good exercise would be to rewrite the code to make use of `DIR.d` instead of having `readdir()`'s callers have their own `Dirent` pointers.
Nathon
Yes, this is a plausible explanation. But considering that the book was published over 20 years ago (2nd edition), it is strange that something like this is not mentioned in the [errata](http://cm.bell-labs.com/cm/cs/cbook/2ediffs.html).
qfab
Strange, yes. But not unheard of. The list of errata you linked was last updated in October of 2006, which means they found an error 18 years after it was published. Plus, it isn't actually broken code; just some wasted memory.
Nathon
That is true, the code is not broken, but it is quite confusing for a beginner like me.
qfab
@Nathon: Could you edit your answer so that it contains the assumption you made in the comments? I would then set it as my accepted answer.
qfab
A: 

It is so they don't have to allocate memory for the Dirent structure that is returned by readdir. This way they can reuse the Dirent between subsiquent calls to readdir.

Burton Samograd
But they don't have to allocate memory anyway because `readdir()` stores the `Dirent` as a static variable.
qfab
@qfab: Yes but that's a really bad design. A hypothetical improved implementation would put the buffer inside the `DIR` structure so that simultaneous reading of multiple directories would not clobber the data (and so it would be thread-safe as long as you don't use a single `DIR` object from more than one thread at a time). I expect modern implementations do this; mine certainly does.
R..