ansaurus

Question

How to parse the file name and rename in Matlab

Answer 1

A:

I don't understand if you want to build the file name based on the date or not. If you just want to change the name of the file you read, you can do this:

filename = 'file_1_2010_03_03.csv';
newfilename = strrep(filename,'file_1_', 'newfile_');
xlswrite(newfilename,M)

UPDATE:

To parse the date from the file name:

dtstr = strrep(filename,'file_1_','');
dtstr = strrep(dtstr,'.csv','');
DT = datenum(dtstr,'yyyy_mm_dd');
disp(datestr(DT))

To build file name based on date (today's for example):

filename = ['file_', datestr(date,'yyyy_mm_dd') '.csv'];

yuk 2010-03-16 16:25:21

I see, I missed that the number after file changes. Go with other solutions.

yuk 2010-03-16 16:48:00

Answer 2

+1 A:

It looks like I misunderstood what you meant with 'file_1', 'file_2' - I thought the numbers 1 and 2 had some kind of importance.

oldFileName = 'something_2010_03_03.csv';
%# extract the date (it's returned in a cell array
theDate = regexp(oldFileName,'(\d{4}_\d{2}_\d{2})','match');
newFileName = sprintf('newfile_%s.xls',theDate{1});

Older Version with Explanations

I assume that the date in all your files is the same. So your program would go

%# load the files, put the names into a cell array
fileNames = {'file_1_2010_03_03.csv','file_2_2010_03_03.csv','file_3_2010_03_03.csv'};

%# parse the file names for the number and the date
%# This expression looks for the n-digit number (1,2, or 3 in your case) and puts
%# it into the field 'number' in the output structure, and it looks for the date
%# and puts it into the field 'date' in the output structure
%# Specifically, \d finds digits, \d+ finds one or several digits, _\d+_
%# finds one or several digits that are preceded and followed by an underscore
%# _(?<number>\d+)_ finds one or several digits that are preceded and follewed 
%# by an underscore and puts them (as a string) into the field 'number' in the 
%# output structure. The date part is similar, except that regexp looks for 
%# specific numbers of digits
tmp = regexp(fileNames,'_(?<number>\d+)_(?<date>\d{4}_\d{2}_\d{2})','names');
nameStruct = cat(1,tmp{:}); %# regexp returns a cell array. Catenate for ease of use

%# maybe you want to loop, or maybe not (it's not quite clear from the question), but 
%# here's how you'd do with a loop. Anyway, since the information about the filenames
%# is conveniently stored in nameStruct, you can access it any way you want.
for iFile =1:nFiles
   %# do some processing, get the matrix M

   %# and create the output file name
   outputFileX = sprintf('newfileX_%s_%s.xls',nameStruct(iFile).number,nameStruct(iFile).date);
   %# and save
   xlswrite(outputFileX,M)
end

See regular expressions for more details on how to use them. Also, you may be interested in uipickfiles to replace uigetfile.

Jonas 2010-03-16 16:27:48

What does 'd+' do?? what does the '+' do?

Paul 2010-03-16 17:10:29

@Paul: I have added a bit more explanation. Hope it makes things clearer!

Jonas 2010-03-16 17:38:45

lets make it simplei use [a,patha]=uigetfile({'*.csv'},'Select the file','c:\ Data'); File_selected=afile1=[patha a]; oldFileName = a; % newFileName = regexprep(oldFileName,'pwr_avg_\d+_','newfile_')when i do this it gives me new file name as newfile_03_03.csvwhy did it miss 2010 as my initial file name was file_1_2010_03_03.csv

Paul 2010-03-16 17:51:36

Well, if your old file name was 'pwr_avg_2010_03_03.csv', then newFileName will be 'newfile_03_03.csv', because 2010 matches _\d+_, i.e. multiple digits surrounded by underscores. What are your file names exactly? Note that if all you need is the dates, the regexp in my EDIT works perfectly fine.

Jonas 2010-03-16 18:58:54

I have diff file names but the format is the sameXX_2010_03_03.csv , All of them have diff XX

Paul 2010-03-16 19:14:16

@Paul: There's your problem. We were assuming that XX would always be XX_#, where # is some *number*, because those were the examples you gave above. Our solutions are snagging 2010 as the number we expect *before* the date, thus leaving it out of the date.

gnovice 2010-03-16 19:18:50

@Paul: now my solution just extracts the date. Note that the date should always be (4 digits)(underscore)(2 digits)(underscore)(2 digits).

Jonas 2010-03-16 19:26:19

@gnoice: I do not care for XX all i want is _2010_03_03to make newfile_2010_03_03.xls

Paul 2010-03-16 19:28:40

@Jonas: Woo hoo it works!! Thanks a lot Jonas. Have a good day

Paul 2010-03-16 19:39:52

+1: After all the reclarifications of the question details, it look like REGEXP is a more compact solution than TEXTSCAN. ;)

gnovice 2010-03-16 20:00:06

@Paul: a nice thing to do is upvoting/accepting an answer if it turns out to help.

Jonas 2010-03-16 20:28:37

@gnovice: thanks!

Jonas 2010-03-16 20:29:03

Answer 3

A:

If your 3 files from UIGETFILE all have the same date in their name, then you can just use one of them to do the following (after you have processed all your data from the 3 files):

fileName = 'file_1_2010_03_03.csv';          %# One of your 3 file names
data = textscan(fileName,'%s',...            %# Split string at '_' and '.'
                'Delimiter','_.');
fileString = sprintf('_%s_%s_%s.xls',..      %# Make the date part of the name
                     data{1}{(end-3):(end-1)});
xlswrite(['newfileX' fileString],dataX);     %# Output "X" data
xlswrite(['newfileXY' fileString],dataXY);   %# Output "XY" data
xlswrite(['newfileXZ' fileString],dataXZ);   %# Output "XZ" data
xlswrite(['newfileYZ' fileString],dataYZ);   %# Output "YZ" data

The function TEXTSCAN is used to break up the old file name at the points where '_' or '.' characters occur. The function SPRINTF is then used to put the pieces for the date back together.

gnovice 2010-03-16 16:41:51

Is there anyway the output file does not have the .csv?i mean with the above code i get new file as newfileXY_2010_03_03.csv.xls

Paul 2010-03-16 17:09:32

@Paul: I fixed the typo in the code. It should work how you want it to now.

gnovice 2010-03-16 17:24:48

I liked your earlier ans it was simpler :P

Paul 2010-03-16 17:26:31

with the code my output gives me right name minus 2010. why am i missing it?I get it when i input like you did within comma but i am using uiget. so for me my currentFile= a

Paul 2010-03-16 18:26:32

@Paul: I simplified the code since your comments suggested you were processing the 3 files first, then outputting 4 files. The above code works for me without any problems, so you should double check what your variable `a` contains (it should have the same format as `fileName` in my above code).

gnovice 2010-03-16 18:37:33

Yes this code works , but my output looks like newfileX_03_03.xls i do get the full if i manully input fileName = 'file_1_2010_03_03.csv' but i input by uiget like [a,patha]=uigetfile({'*.csv'},'Select the file','c:\ Data'); File_selected=a file1=[patha a]; fileName = a;and my a=file_1_2010_03_03.csv , which i am checking while i run

Paul 2010-03-16 18:51:00

@Paul: Now that you have specified that the `'file_1'` part of the name could be anything, I've updated my answer.

gnovice 2010-03-16 19:55:10

@gnoice . Thank you very much

Paul 2010-03-16 19:58:21

Answer 4

A:

Presumably, all of these files are sitting in a directory somewhere and you'd like to process them in batch. You can use code like this to read the files in a particular directory and find the ones that end in 'csv'. That way, you don't have to change your code at all if you'd like to process a new file -- you just drop it in the directory and run your program.

extension = 'csv';

files = dir();  % e.g. use current directory

% find files with the proper extension
extLength = length(extension);
for k = 1:length(files)
    nameLength = length(files(k).name);
    if nameLength > extLength
        if (files(k).name((nameLength - extLength + 1):nameLength) == extension)
            a(k).name
            % process file here...
        end
    end
end

You can make it more compact by incorporating the regexp processing that Jonas suggested.

seanmac7577 2010-03-16 16:50:35

ansaurus

tags:

views:

answers:

How to parse the file name and rename in Matlab

related questions