views:

604

answers:

2

I have the following code that I need to run over a matrix with over 20000 rows. It takes several minutes to run and the datenum and str2double functions appear to be the bottlenecks. Since no calculation depends on previous ones is there a way to break the loop into multiple parts and have them execute in parallel? Any advice on optimising this code would be appreciated.

for i=1:length(DJI)
DJI2(i,1)=datenum(char(DJI(i,2)),'yyyy-mm-dd');
for j=3:7
DJI2(i,j-1)=str2double(char(DJI(i,j)));
end
end
A: 

Hmm. I'm more of a MATLAB person than Octave but maybe I can help (if you are still looking for a solution)

This looks like the I'm-reading-in-a-file-but-I-need-to-do-something-different-than-the-tool-provides problem (otherwise you could get away with dlmread which should be pretty fast).

If there were no alternative within Octave to be faster, I'd try using Java (for speed rather than threading); you can call Java from Octave. (though I haven't tried this in Octave, just the MATLAB equivalent)

The calls to str2double look awfully suspicious. You may be able to vectorize that, although a quick speed test on my part seems to confirm that this is a Slow Task, at least from within Octave:

octave-3.0.3.exe:15> s=sprintf('1 2\n3 4');
octave-3.0.3.exe:16> m=str2double(s)
m =

   1   2
   3   4


octave-3.0.3.exe:35> s=randn(5000,5);
octave-3.0.3.exe:36> z=num2str(s);
octave-3.0.3.exe:37> tic; s2=str2double(z); toc
Elapsed time is 18.9837 seconds.
Jason S
A: 

The fastest thing to do, if your data is in a text file, is use textread.

function [DJI2] = InterpretFile(datafile)
    [txtdates, c2, c3, c4, c5, c6] = textread(datafile, '%* %s %f %f %f %f %f');
    dates = datenum(strvcat(txtdates),'yyyy-mm-dd');
    DJI2 = [dates c2 c3 c4 c5 c6];

The format line in textread tells it to skip the first column, copy the second column as a string, and interpret the 3rd through 7th columns as floating point numbers. This assumes your data file looks something like

skip 1990-01-01 1.234 2.345 3.456 4.012 5.345
skipme2 1990-01-02 1 2 3 4 5
junk 1990-01-03 1.9 2.1 3.2 4.3 5.4

Also, str2num is about 3x faster than str2double (I guess because it doesn't do as much error checking), in case you need to use something more like your original technique.

mtrw