views:

137

answers:

1

I have a data set that is composed as such:

2009,11,01,17,00,23,1.471700,1.472000

2009,11,01,17,01,04,1.471600,1.471900

2009,11,01,17,01,09,1.471900,1.472100

2009,11,01,17,01,12,1.472000,1.472300

2009,11,01,17,01,13,1.471900,1.472200

2009,11,01,17,01,14,1.471600,1.471900

2009,11,01,17,01,18,1.471700,1.472000

2009,11,01,17,01,18,1.471900,1.472200

I am using Octave to manipulate this data. I would like to use this tick data to create various files containing the data in 5, 10. and 30 minute intervals. With this format they could be plotted as a bar/candlestick chart and further calculations performed. However, I don't really have any idea how to approach the looping over the data to create such files.

I am familiar with Octave and use this software, but this particular task could be undertaken in some other software to produce files for later import into Octave.

My first attempt to code this in Octave gives this error:-

error: A(I,J,...) = X: dimensions mismatch error: called from: error: /home/andrew/Documents/forex_convert/tick_to_min.m at line 105, column 25

The code that produces it is

[i,j]=find(fMM>=45 & fMM<50);

min_5_vec(1:length(i),1)=tick_data(min(i):max(i),1); % line 105

The code checks the "minutes" vector fMM and should extract and create a new "min_5_vec" vector containing all tick data that occurred between the times HH:45:00 and HH:49:59 for every hour. The thing is this code, which is part of a function, appears to fail only on this particular line which I find very strange as it has been copied and pasted and only the figures 45 and 50 have been changed, and the other similar parts of the function code up to line 105 do not fail. I have visually checked the raw data and can see no cause for the nature of the data to be the reason for the failure. Any suggestions for the possible cause of the failure?

A: 

First, use datenum to convert your year,month,day,hour,minute,second variables to times:

datenum(2009,11,01,17,00,23)

will return the number of days past since 1/1/0000. lets say you save all the times in a vector called times. now, it should be easy enough to find the first/last time you have:

first = min(times); 
last = max(times);

one minute is equal to:

ONE_MINUTE = 1/24/60

now the binning is done like:

index = 1;
means = [];
for t = first:5*ONE_MINUTE:last
    current_bin = (times>=t) & (times<t+5*ONE_MINUTE)
    % do something with all the data for which current_bin==1
    means(index) = mean(data(current_bin));
    index = index+1;
end

Just for the example, I calculated the means of the data in each bin. I assume you have a vector called data which contains some data for each time.

(I know this can be optimized a lot, but I preferred clarity over performance for this answer)

Ofri Raviv