tags:

views:

972

answers:

4

I need to get all those files under D:\dic and loop over them to further process individually.

Does MATLAB support this kind of operations?

It can be done in other scripts like PHP,Python...

+1  A: 

You're looking for dir to return the directory contents.

To loop over the results, you can simply do the follow (it's been a while since I last used Matlab, so it may not be 100% accurate):

dirlist = dir('.');
for i = 1:length(dirlist)
    dirlist(i)

This should give you output in the following format, e.g.:

name: 'my_file'
date: '01-Jan-2010 12:00:00'
bytes: 56
isdir: 0
datenum: []
James Burgess
Can you make it search recursively including files in subdirectories but excluding directory itself?
Gtker
Not off the top of my head, no (I no longer have regular access to Matlab), but this may help you: http://www.mathworks.com/matlabcentral/fileexchange/19550-recursive-directory-listing
James Burgess
How to exclude `.` and `..` ?
Gtker
manually test for `.` and `..`
Jason S
recursively: write your program to work recursively
Jason S
@Runner: to exclude . and .., remove the first two entries in the output of dir. Or, in case you're looking for a specific file type, run `dir('*.ext')`, which automatically excludes directories (unless they end in .ext, of course)
Jonas
A: 

You can use regexp or strcmp to eliminate . and .. Or you could use the isdir field if you only want files in the directory, not folders.

list=dir(pwd);  %get info of files/folders in current directory
isfile=~[list.isdir]; %determine index of files vs folders
filenames={list(isfile).name}; %create cell array of file names

or combine the last two lines:

filenames={list(~[list.isdir]).name};

For a list of folders in the directory excluding . and ..

dirnames={list([list.isdir]).name};
dirnames=dirnames(~(strcmp('.',dirnames)|strcmp('..',dirnames)));

From this point, you should be able to throw the code in a nested for loop, and continue searching each subfolder until your dirnames returns an empty cell for each subdirectory.

Doresoom
+1,this is very compact,but it doesn't search recursively.
Gtker
@Runner: It does if you use some for and while loops...but I'm to lazy to implement that right now.
Doresoom
A: 

I don't know a single-function method for this, but you can use genpath to recurse a list of subdirectories only. This list is returned as a semicolon-delimited string of directories, so you'll have to separate it using strread, i.e.

dirlist = strread(genpath('/path/of/directory'),'%s','delimiter',';')

If you don't want to include the given directory, remove the first entry of dirlist, i.e. dirlist(1)=[]; since it is always the first entry.

Then get the list of files in each directory with a looped dir.

filenamelist=[];
for d=1:length(dirlist)
    % keep only filenames
    filelist=dir(dirlist{d});
    filelist={filelist.name};

    % remove '.' and '..' entries
    filelist([strmatch('.',filelist,'exact');strmatch('..',filelist,'exact'))=[];
    % or to ignore all hidden files, use filelist(strmatch('.',filelist))=[];

    % prepend directory name to each filename entry, separated by filesep*
    for f=1:length(filelist)
        filelist{f}=[dirlist{d} filesep filelist{f}];
    end

    filenamelist=[filenamelist filelist];
end

filesep returns the directory separator for the platform on which MATLAB is running.

This gives you a list of filenames with full paths in the cell array filenamelist. Not the neatest solution, I know.

JS Ng
For performance reason I don't want to `genpath`,it essentially searches twice.
Gtker
One drawback to using GENPATH is that it will only include subdirectories that are allowed on the MATLAB path. For example, if you have directories named `private`, they will not be included.
gnovice
+2  A: 

Here's a function that searches recursively through all subdirectories of a given directory, collecting a list of all file names it finds:

function fileList = getAllFiles(dirName)

  dirData = dir(dirName);      %# Get the data for the current directory
  dirIndex = [dirData.isdir];  %# Find the index for directories
  fileList = {dirData(~dirIndex).name}';  %'# Get a list of the files
  if ~isempty(fileList)
    fileList = cellfun(@(x) fullfile(dirName,x),...  %# Prepend path to files
                       fileList,'UniformOutput',false);
  end
  subDirs = {dirData(dirIndex).name};  %# Get a list of the subdirectories
  validIndex = ~ismember(subDirs,{'.','..'});  %# Find index of subdirectories
                                               %#   that are not '.' or '..'
  for iDir = find(validIndex)                  %# Loop over valid subdirectories
    nextDir = fullfile(dirName,subDirs{iDir});    %# Get the subdirectory path
    fileList = [fileList; getAllFiles(nextDir)];  %# Recursively call getAllFiles
  end

end

After saving the above function somewhere on your MATLAB path, you can call it in the following way:

fileList = getAllFiles('D:\dic');
gnovice
How to make it return the full path instead of only file names?
Gtker
+1 - Great solution. I don't know if it's necessary, but if you insert the line:fileList = cellfun(@(x) strcat([dirName,'\'],x),fileList,'UniformOutput',0);into your solution between the first fileList definition and the subDirs definition, it will return the full path and filename for each file.
Doresoom
@Doresoom: Good suggestion, although I went with using FULLFILE instead, since it handles the choice of file separator for you (which is different on UNIX and Windows). Also, you could just do `fileList = strcat(dirName,filesep,fileList);` instead of using CELLFUN, although you can end up with extra unnecessary file separators that way, which FULLFILE also takes care of for you.
gnovice
@gnovice - Quick question: will dir always return the first two entries as . and .. ? That should make sorting them out much easier, but I was wary of just writing (3:end) instead of comparing the names.
Doresoom
Oh,what's `@(x) fullfile(dirName,x)`?Anonymous function?Why it doesn't have return values like ordinary matlab functions?
Gtker
@Doresoom: I'm pretty sure `.` and `..` are always the first two entries. I've never seen a case where they aren't (other than, of course, searching for files or directories matching a specific name format).
gnovice
@Runner: Yes, that's an anonymous function. The return value is just whatever value is returned from the expression it contains, which in this case is the output from FULLFILE.
gnovice
Is it necessary here `'UniformOutput',false` to make it work?I've read `help cellfun` but still don't see why it's necessary.
Gtker
@Runner: The arguments `'UniformOutput',false` force CELLFUN to output the results as a cell array, where each cell in the output contains the results of applying the anonymous function to the corresponding cell of the input. This is necessary when the results of processing each input cell can't be easily concatenated into another type of array (char, numeric, etc.).
gnovice
@gnovice, @Doreseoom - According to http://www.mathworks.com/access/helpdesk/help/techdoc/ref/dir.html, the order that 'dir' returns is OS dependent. I'm not sure what happens if, for instance, you set the DOS DIRCMD variable to something that changes the order. Octave handles it ok (. and .. are still first) but I don't have MATLAB to test.
mtrw
@mtrw: Since it's sounding like there's a non-zero probability that the sort order of directories returned by DIR may vary, I modified the code to work with an arbitrary positioning of `'.'` and `'..'` in the directory list.
gnovice