I would like to read a (fairly big) log file into a matlab string cell in one step. I have used the usual:
s={};
fid = fopen('test.txt');
tline = fgetl(fid);
while ischar(tline)
s=[s;tline];
tline = fgetl(fid);
end
but this is just slow. I have found that
fid = fopen('test.txt');
x=fread(fid,'*char');
is way faster, but i get a nx1 char matrix x. i could try and convert x to a string cell, but then i get into char encoding hell; line delimiter seems to be \n\r, or 10 and 56 in ascii (ive looked at the end of the first line), but those two chars often dont follow each other and even show up solo sometimes.
so my question: is there an easy fast way to read an ascii file into a string cell in one step, or convert x to a string cell?
thank you.
edit:
reading via fgetl:
Code Calls Total Time % Time
tline = lower(fgetl(fid)); 903113 14.907 s 61.2%
reading via fread:
>> tic;for i=1:length(files), fid = open(files(i).name);x=fread(fid,'*char*1');fclose(fid); end; toc
Elapsed time is 0.208614 seconds.
edit2:
i have tested preallocation, does not help :(
files=dir('.');
tic
for i=1:length(files),
if files(i).isdir || isempty(strfind(files(i).name,'.log')), continue; end
%# preassign s to some large cell array
sizS = 50000;
s=cell(sizS,1);
lineCt = 1;
fid = fopen(files(i).name);
tline = fgetl(fid);
while ischar(tline)
s{lineCt} = tline;
lineCt = lineCt + 1;
%# grow s if necessary
if lineCt > sizS
s = [s;cell(sizS,1)];
sizS = sizS + sizS;
end
tline = fgetl(fid);
end
%# remove empty entries in s
s(lineCt:end) = [];
end
toc
Elapsed time is 12.741492 seconds.
edit 3/solution:
roughly 10 times faster than the original:
s = textscan(fid,'%s','Delimiter','\n','whitespace','','bufsize',files(i).bytes);
had to set 'whitespace' to '' in order to keep the leading spaces (which i need for parsing), and 'bufsize' to the size of the file (the default 4000 threw a buffer overflow error).