tags:

views:

623

answers:

3

I am little stuck on a problem. I have tons of files generated daily and I need to sort them by file name and date. I need to do this so my MATLAB script can read them. I currently do this manually, but was wondering if there is a easier way in MATLAB to sort and copy files.

My file names look like:

data1_2009_12_12_9.10
data1_2009_12_12_9.20
data1_2009_12_12_9.30
data1_2009_12_12_9.40
data2_2009_12_12_9.10
data2_2009_12_12_9.20
data2_2009_12_12_9.30
data2_2009_12_12_9.40
data3_2009_12_12_9.10
data3_2009_12_12_9.20
data3_2009_12_12_9.30
data3_2009_12_12_9.40
...

and tons of files like this.

Addition to above problem :

There has to be a easier way to stitch the files together. I mean copy file ' data1_2009_12_12_9.20' after file 'data1_2009_12_12_9.10' and so on ,... such that i am left with a huge txt file in end named data1_2009_12_12 ( or what ever ). containing all the data stitched together. Only way now i know to do is open all files with individual dlmread command in matlab and xls write one after another ( or more trivial way of copy paste manually )

+1  A: 

Hi

In Matlab the function call

files = dir('.');

returns a structure (called files) with fields

name

date

bytes

isdir

datenum

You can use your usual Matlab techniques for manipulating files.names.

Regards

Mark

High Performance Mark
+4  A: 

Working in the field of functional imaging research, I've often had to sort large sets of files into a particular order for processing. Here's an example of how you can find files, parse the file names for certain identifier strings, and then sort the file names by a given criteria...

Collecting the files...

You can first get a list of all the file names from your directory using the DIR function:

dirData = dir('your_directory');      %# Get directory contents
dirData = dirData(~[dirData.isdir]);  %# Use only the file data
fileNames = {dirData.name};           %# Get file names

Parsing the file names with a regular expression...

Your file names appear to have the following format:

'data(an integer)_(a date)_(a time)'

so we can use REGEXP to parse the file names that match the above format and extract the integer following data, the three values for the date, and the two values for the time. The expression used for the matching will therefore capture 6 "tokens" per valid file name:

expr = '^data(\d+)\_(\d+)\_(\d+)\_(\d+)\_(\d+)\.(\d+)$';
fileData = regexp(fileNames,expr,'tokens');  %# Find tokens
index = ~cellfun('isempty',fileData);        %# Find index of matches
fileData = [fileData{index}];                %# Remove non-matches
fileData = vertcat(fileData{:});             %# Format token data
fileNames = fileNames(index);                %# Remove non-matching file names

Sorting based on the tokens...

You can convert the above string tokens to numbers (using the STR2DOUBLE function) and then convert the date and time values to a date number (using the function DATENUM):

nFiles = size(fileData,1);              %# Number of files matching format
fileData = str2double(fileData);        %# Convert from strings to numbers
fileData = [fileData zeros(nFiles,1)];  %# Add a zero column (for the seconds)
fileData = [fileData(:,1) datenum(fileData(:,2:end))];  %# Format dates

The variable fileData will now be an nFiles-by-2 matrix of numeric values. You can sort these values using the function SORTROWS. The following code will sort first by the integer following the word data and next by the date number:

[fileData,index] = sortrows(fileData,1:2);  %# Sort numeric values
fileNames = fileNames(index);               %# Apply sort to file names

Concatenating the files...

The fileNames variable now contains a cell array of all the files in the given directory that match the desired file name format, sorted first by the integer following the word data and then by the date. If you now want to concatenate all of these files into one large file, you could try using the SYSTEM function to call a system command to do this for you. If you are using a Windows machine, you can do something like what I describe in this answer to another SO question where I show how you can use the DOS for command to concatenate text files. You can try something like the following:

inFiles = strcat({'"'},fileNames,{'", '});  %# Add quotes, commas, and spaces
inFiles = [inFiles{:}];                     %# Create a single string
inFiles = inFiles(1:end-2);                 %# Remove last comma and space
outFile = 'total_data.txt';                 %# Output file name
system(['for %f in (' inFiles ') do type "%f" >> "' outFile '"']);

This should create a single file total_data.txt containing all of the data from the individual files concatenated in the order that their names appear in the variable fileNames. Keep in mind that each file will probably have to end with a new line character to get things to concatenate correctly.

gnovice
Wow the code looks complicated. I will try to unscramble this after work. Thanks
AP
Thanks a lot i will have to incorporate this in my program in cases where file size becomes too big so i can process more inside my program.
AP
I understand if the above may be more complicated than you need. I actually adapted it from some code I was using for collecting files and directories that had a more varied and complicated set of naming conventions. ;)
gnovice
+3  A: 

An alternative to what @gnovice suggested is to loop over the file names and use sscanf() to recover the different sections in the filenames you are interested in:

n = sscanf(filename, 'data%d_%d_%d_%d_%d.%d')
n(1)    %# data number
n(2)    %# the year
...

Example:

files = dir('data*');                 %# list all entries beginning with 'data'
parts = zeros(length(files), 6);      %# read all the 6 parts into this matrix
for i=1:length(files)
    parts(i,:) = sscanf(files(i).name, 'data%d_%d_%d_%d_%d.%d')';  %'#transposed
end

[parts idx] = sortrows(parts, [6 1]); %# sort by one/multiple columns of choice
files = files(idx);                   %# apply the new order to the files struct


EDIT:

I just saw your edit about merging those files. That can be done easily from the shell. For example lets create one big file for all data from the year 2009 (assuming it makes sense to stack files on top of each other):

on Windows:

type data*_2009_* > 2009.backup

on Unix:

cat data*_2009_* > 2009.backup
Amro
The first part of your answer will work well, and is a little clearer and easier to understand than using REGEXP ;), but may run into problems if there are ever other unrelated files in the directory with names like "data10.txt". I think the second part with the shell commands will only work if the default order of the files in the directory is already sorted appropriately, which *may not* be the case for what the OP is doing (i.e. "data1_2009_12_12_9.10" would come before "data1_2009_12_9_9.10" alpha-numerically).
gnovice
the point was to show how to use the shell to concatenate files. We can always plug in the filenames sorted from the previous step, and issue the command using MATLAB's **system()** function, or even append to the output file using `cat datafile >> output` one file at a time (in a similar for loop to the one before). As to the first point, you can make the search pattern more specific such as `data*_*_*_*_*.*` to avoid any unrelated files :)
Amro
@Amro.. really neat trick to stitch files. I never knew one could do this in command promt..
AP

related questions