views:

86

answers:

4

I am writing a program which searches through all the sub-directories of a given directory. The problem is, is that I know the name of the file that I am looking for (data.txt) but I still need to know all of the (possibly multiple) locations where the file is. I am using this code to search:

struct dirent *dp;
struct stat s;
DIR *dir;

char path[]="/some/path/here/";

if((dir=opendir(path))==NULL){return;}

while((dp=readdir(dir))!=NULL){

  char *temp=malloc((strlen(path)+strlen(dp->d_name)+4)*sizeof(*temp));
  sprintf(temp,"%s%s",path,dp->d_name);//concatenate path

  lstat(temp,&s);//stat the path

  if(S_ISREG(s.st_mode)){//if regular file
    if(!strcmp(dp->d_name,"data.txt")){
      printf("found one: %s\n",temp);//found the target file
    }

  }else if(S_ISDIR(s.st_mode) && !S_ISLNK(s.st_mode)){//if directory, but not symlink
    if(strcmp(dp->d_name,".") && strcmp(dp->d_name,"..")){//ignore "." and ".."
      //recurse on the subdirectories
    }

  }


  free(temp);

}
closedir(dir);

The code works fine and its still very fast, but I still feels that it's very inefficient to be lstat-ing every file/directory in the filesystem just to look for directories.

Is there a more efficient way of searching so that only directories are returned via readdir?

I'm using gcc on Fedora 13

A: 

You can try to read sources of a linux find command.

zed_0xff
+2  A: 

ftw() (or nftw() ) are the calls to implement a find-like function.

The reason stat or lstat is required is to know what file type you have - regular, link, directory, etc.

It is possible, though not likely at all, to have "data.txt" be a directory, a link, and a regular file. You have to be able to sort it out to get what you want. ftw() returns a stat struct * to a callback function - which is an argument to ftw().

jim mcnamara
This function calls stat behind the scenes, so they are most likely doing the same thing. I was just curious if there was a function that returned only the directories, and skipped over files altogether.
sigint
Yes, there is - you write it as your callback function.
jim mcnamara
+2  A: 

Instead of using lstat on each returned value, use the dirent's d_type field (see readdir man page), eg.

while((dp=readdir(dir))!=NULL){
    ...
    if (dp->d_type == DT_REG)
    {
      /* handle regular file */
    }
    else if (dp->d_type == DT_DIR)
    {
      /* handle directory */
    }
}
Hasturkun
I was actually doing exactly that at first. I switched to my current implementation after reading on http://linux.die.net/man/3/readdir that `According to POSIX, the dirent structure contains a field char d_name[] of unspecified size, (...) Use of other fields will harm the portability of your programs.`
sigint
@jdkomo: if you're targetting linux, just go with this. it's by far the easiest to bolt on to what you have, and the least invasive. Recommend you check for `_DIRENT_HAVE_D_TYPE` as mentioned in the NOTES, and adjust accordingly.
Matt Joiner
@Matt Joiner, I was never targeting windows, so this may work out fine. I assumed that when they referred to portability, they meant cross-distro, Ubuntu/Fedora/etc and not Linux/Windows. Thanks.
sigint
I also recommend using the `d_type` field. I have used this in some of my Cygwin-based Windows apps as well and have had no problems there.
bta
A: 

You may want to look at the glob function, it sounds like you might be trying to re-implement it.

bta