views:

110

answers:

3

I want to calculate a cumulative sum of the values in column 2 of dat.txt below for each string of ones in column 1. The desired output is shown as dat2.txt:

dat.txt  dat2.txt
1 20     1 20  20   % 20 + 0
1 22     1 22  42   % 20 + 22
1 20     1 20  62   % 42 + 20
0 11     0 11  11
0 12     0 12  12
1 99     1 99  99   % 99 + 0   
1 20     1 20  119  % 20 + 99
1 50     1 50  169  % 50 + 119

Here's my initial attempt:

fid=fopen('dat.txt');
A  =textscan(fid,'%f%f');
in =cell2mat(A); 
fclose(fid);

i = find(in(2:end,1) == 1 & in(1:end-1,1)==1)+1;
out = in;
cumulative =in;
cumulative(i,2)=cumulative (i-1,2)+ cumulative(i,2);

fid = fopen('dat2.txt','wt');
format short g;
fprintf(fid,'%g\t%g\t%g\n',[out cumulative(:)]');
fclose(fid);
A: 
    d=[
1 20     
1 22     
1 20     
0 11     
0 12     
1 99     
1 20     
1 50
];
disp(d)

out=d;
%add a column
out(:,3)=0;

csum=0;
for(ind=1:length(d(:,2)))
    if(d(ind,1)==0)
        csum=0;           
        out(ind,3)=d(ind,2);    
    else
        csum=csum+d(ind,2);
        out(ind,3)=csum;    
    end

end

disp(out)
hash blue
hash blue@ is it possible to just use FIND function and not use LOOP?
Jessy
+3  A: 

Not completely vectorized solution (it loops through the segments of sequential 1s), but should be faster. It's doing only 2 loops for your data. Uses MATLAB's CUMSUM function.

istart = find(diff([0; d(:,1)])==1); %# start indices of sequential 1s
iend = find(diff([d(:,1); 0])==-1); %# end indices of sequential 1s

dcum = d(:,2);
for ind = 1:numel(istart)
    dcum(istart(ind):iend(ind)) = cumsum(dcum(istart(ind):iend(ind)));
end

dlmwrite('dat2.txt',[d dcum],'\t') %# write the tab-delimited file
yuk
@yuk: With the Image Processing Toolbox, you can use `bwlabel` to find the connected groups of 1's
Jonas
@Jonas: I remember this from your answer to another question. Don't have IPT here to test. Anyway, my code is pretty simple. The challenge is how to do cumsum for all groups without for-loop. If we'd have the complete groups in a cell array, we could use CELLFUN. I will be happy to see an example using bwlabel, since I face with similar problem all the time.
yuk
+2  A: 

Here's a completely vectorized (albeit somewhat confusing-looking) solution that uses the functions CUMSUM and DIFF along with logical indexing to produce the results you want:

>> data = [1 20;...  %# Initial data
           1 22;...
           1 20;...
           0 11;...
           0 12;...
           1 99;...
           1 20;...
           1 50];
>> data(:,3) = cumsum(data(:,2));       %# Add a third column containing the
                                        %#   cumulative sum of column 2
>> index = (diff([0; data(:,1)]) > 0);  %# Find a logical index showing where
                                        %#   continuous groups of ones start
>> offset = cumsum(index.*(data(:,3)-data(:,2)));  %# An adjustment required to
                                                   %#   zero the cumulative sum
                                                   %#   at the start of a group
                                                   %#   of ones
>> data(:,3) = data(:,3)-offset;        %# Apply the offset adjustment
>> index = (data(:,1) == 0);            %# Find a logical index showing where
                                        %#   the first column is zero
>> data(index,3) = data(index,2)        %# For each zero in column 1 set the
                                        %#   value in column 3 to be equal to
data =                                  %#   the value in column 2

     1    20    20
     1    22    42
     1    20    62
     0    11    11
     0    12    12
     1    99    99
     1    20   119
     1    50   169
gnovice
Very good idea! +1
yuk
I wonder if the function DIFF only applicable to test one condition? If I want to find cumulative sum which fullfill two conditions, how to do that without using the function FIND?
Jessy

related questions