views:

127

answers:

4

Hi,

I have the a for loop which perform the following function:

Take a M by 8 matrix and
1) Spilt it into blocks of size 512 elements (meaning X by 8 of the matrix == 512, and the number of elements can be 128,256,512,1024,2048)
2) Reshape the block into 1 by 512 (Number of elements) matrix.
3) Take the last 1/4 of the matrix and put it infront,
e.g. Data = [Data(1,385:512),Data(1,1:384)];

The following is my code:

for i = 1 : NumOfBlock  
    if i == 1  
        Header = tempHeader(1:RowNeeded,:);  
        Header = reshape(Header,1,BlockSize); %BS  
        Header = [Header(1,385:512),Header(1,1:384)]; %CP  
        Data = tempData(1:RowNeeded,:);  
        Data = reshape(Data,1,BlockSize); %BS  
        Data = [Data(1,385:512),Data(1,1:384)]; %CP  
        start = RowNeeded + 1;  
        end1 = RowNeeded * 2;  
    else  
        temp = tempData(start:end1,:);  
        temp = reshape(temp,1,BlockSize); %BS  
        temp = [temp(1,385:512),temp(1,1:384)]; %CP  
        Data = [Data, temp];  
    end

    if i <= 127 & i > 1
        temp = tempHeader(start:end1,:);
        temp = reshape(temp,1,BlockSize); %BS
        temp = [temp(1,385:512),temp(1,1:384)]; %CP
        Header = [Header, temp];
    end

    start = end1 + 1;
    end1=end1 + RowNeeded;  
end

Running this loop with 5 million element will take more than 1 hour. I need it to be as fast as possible (in sec). Is this loop able to be vectorize?

A: 

It would be nice if you'd tell what you are trying to do (my guess is some simulation in dynamical systems, but it's hard to tell).

yes, of course it can be vectorized: each of your blocks is actually four sub blocks; using your (extremely non standard) indices:

1...128, 129...256, 257...384, 385...512

Every kernel/thread/what-ever-you-call-it of the vectorization should do the following:

i = threadIdx is between 0 and 127 temp = data[1 + i] data[1 + i] = data[385+i] data[385 + i] = data[257+i] data[257 + i] = data[129+i] data[129 + i] = temp

You should of course also parallelize on blocks, not only vectorize.

David Lehavi
I am doing Block Spreading in CDMA. temp = reshape(temp,1,BlockSize); is to take 1st column and turn it to a 1 by N matrix. Repeat for the rest of the columns and append it to the end of the 1st 1 by N matrix.temp = [temp(1,385:512),temp(1,1:384)]; is to do a cyclic prefix insertion.
HH
+3  A: 

Vectorizing may or may not help. What will help is knowing where the bottleneck is. Use the profiler as outlined here:

http://blogs.mathworks.com/videos/2006/10/19/profiler-to-find-code-bottlenecks/

MatlabDoug
+3  A: 

Based on your function description, here's what I came up with:

M = 320;           %# M must be divisble by (numberOfElements/8)
A = rand(M,8);     %# input matrix

num = 512;         %# numberOfElements
rows = num/8;      %# rows needed

%# equivalent to taking the last 1/4 and putting it in front
A = [A(:,7:8) A(:,1:6)];

%# break the matrix in blocks of size (x-by-8==512) into the third dimension
B = permute(reshape(A',[8 rows M/rows]),[2 1 3]);

%'# linearize everything
B = B(:);

this diagram might help in understanding the above:

alt text

Amro
Thanks! although it is not exactly what I want but your solution has given me an idea on how to go about solving my problem. Sorry for not making myself clear in the question.
HH
A: 

Once again I would like to thanks Amro for giving me an idea on how to solve my question. Sorry for not making myself clear in the question.

Here is my solution to my problem:

%#BS CDMA, Block size 128,512,1024,2048  
  BlockSize = 512;  
  RowNeeded = BlockSize / 8;  
  TotalRows = size(tempData);  
  TotalRows = TotalRows(1,1);  
  NumOfBlock = TotalRows / RowNeeded;  
  CPSize = BlockSize / 4;  

%#spilt into blocks  
  Header = reshape(tempHeader',[RowNeeded,8, 128]);  
  Data = reshape(tempData',[RowNeeded,8, NumOfBlock]);  
  clear tempData tempHeader;  

%#block spread & cyclic prefix  
    K = zeros([1,BlockSize,128],'single');  
    L = zeros([1,BlockSize,NumOfBlock],'single');  
       for i = 1:NumOfBlock  
           if i <= 128  
              K(:,:,i) = reshape(Header(:,:,i),[1,BlockSize]);  
              K(:,:,i) = [K((CPSize*3)+1:BlockSize),K(1:CPSize*3)];
           end  
           L(:,:,i) = reshape(Data(:,:,i),[1,BlockSize]);  
           L(:,:,i) = [L((CPSize*3)+1:BlockSize),L(1:CPSize*3)];
        end
HH