ansaurus

Question

Read whole text file into matlab variable at once

Answer 1

A:

Use the fgetl function instead of fread. For more info, go here

Raze2dust 2010-08-21 12:18:59

i am using fgetl, however it is slow..

stephan hattinger 2010-08-21 14:56:36

Answer 2

+1 A:

The main reason your first example is slow is that s grows in every iteration. This means recreating a new array, copying the old lines, and adding the new one, which adds unnecessary overhead.

To speed up things, you can preassign s

%# preassign s to some large cell array
s=cell(10000,1);
sizS = 10000;
lineCt = 1;
fid = fopen('test.txt');
tline = fgetl(fid);
while ischar(tline)
   s{lineCt} = tline;
   lineCt = lineCt + 1;
   %# grow s if necessary
   if lineCt > sizS
       s = [s;cell(10000,1)];
       sizS = sizS + 10000;
   end
   tline = fgetl(fid);
end
%# remove empty entries in s
s(lineCt:end) = [];

Here's a little example of what preallocation can do for you

>> tic,for i=1:100000,c{i}=i;end,toc
Elapsed time is 10.513190 seconds.

>> d = cell(100000,1);
>> tic,for i=1:100000,d{i}=i;end,toc
Elapsed time is 0.046177 seconds.
>>

EDIT

As an alternative to fgetl, you could use TEXTSCAN

fid = fopen('test.txt');
s = textscan(fid,'%s','Delimiter','\n');
s = s{1};

This reads the lines of test.txt as string into the cell array s in one go.

Jonas 2010-08-21 12:29:29

I was about to give the same answer but there's something that I don't understand: The content of each cell of the cell array is undefined. Does pre-allocation help in this case?

Amaç Herdağdelen 2010-08-21 12:31:44

@Amac: Yes, it does. See my edit.

Jonas 2010-08-21 13:52:56

Great, thanks. Just to be sure, I replicated it with strings with varying lengths, and you still got a huge performance increase.

Amaç Herdağdelen 2010-08-21 14:23:19

Thank you for your quick answer! i have coded the example to show the general problem, but did not think of the fact that probably not pre-allocating in the example would slow things down. however, in my case i instantly parse the lines, i.e. there is no string cell s. profiling leads to about 60% of time spent in the line "tline = fgetl(fid);" (with the other code being not optimized for now).

stephan hattinger 2010-08-21 14:55:52

@stephan hattinger: What kind of parsing do you do? Could you use textscan, or fscanf to do the parsing right away?

Jonas 2010-08-21 15:06:23

it is rather complicated and context sensitive parsing. most of the lines dont even interest me, its only a couple of about 200 lines long blocks per file. so what i do is: find the block entry token, read lines until block end token, and pass the string cell to a recursive parsing routine (a block is generally an indented print of a very nested object (with arrays too))

stephan hattinger 2010-08-21 15:13:02

@stephan hattinger: You can use `textscan`. See my edit. I hope it's a bit faster than `fgetl`.

Jonas 2010-08-21 15:34:07

than you very much, this solved my problem! read time went down to 1/10th compared to fgetl.

stephan hattinger 2010-08-21 20:57:48

ansaurus

tags:

views:

answers:

Read whole text file into matlab variable at once

related questions