tags:

views:

83

answers:

2

The bottleneck in my program is computing the sign of a number for all numbers in an array, when the array size is very large. I show the two approaches I've tried below, both with similar results. I have 16GB of RAM, and the array occupies ~5GB. The problem I'm seeing is the sign function takes up a lot of RAM+virtual memory. Anyone know of a way to reduce the memory requirements and speed up this process for putting the sign of array input into array output (see below)?

Using a for loop with if or switch commands doesn't run out of memory, but takes an hour to complete (way too long).

size = 1e9; % size of large array (just an example, could be larger)
output = int8(zeros(size,1)-1); % preallocate to -1
input = single(rand(size,1));   % create random array between 0 and 1
scalar = single(0.5); % just a scalar number, set to 0.5 (midpoint) for example

% approach 1 (comment out when using approach 2)
output = int8(sign(input - scalar));  % this line of code uses a ton of RAM and virtual memory

% approach 2
output(input>scalar) = 1;            % this line of code uses a ton of RAM and virtual memory
output(input==scalar) = 0;           % this line of code uses a ton of RAM and virtual memory

Thanks in advance for any suggestions.

+1  A: 

It may be that sign converts the input to double intermittently.

Anyway, if it's fine if output is 1 for positive and 0 for negative or zero, you could try

siz = 1e9; %# do not use 'size' as a variable, since it's an important function
input = rand(siz,1,'single'); %# this directly creates a single array
scalar = single(0.5);
output = input>scalar;

EDIT Actually, I see a short spike in memory usage even for this solution. Maybe this is related to multithreading? Anyway, the speed problem comes from the fact that you start paging, which slows everything to a crawl.

Jonas
Unfortunately, I need the output array to equal 1 to indicate the input element is positive, -1 for negative, and 0 for output values exactly equal to 0 (just like the 'sign' function would return).
ggkmath
+4  A: 

If you use a for loop but pass the data in in chunks, it's almost as fast as the fully-vectorized version, but without the memory overhead:

chunkSize = 1e7;
for start=1:chunkSize:size
    stop = min(start+chunkSize, size);
    output(start:stop) = int8(sign(input(start:stop)-scalar));
end

Also, your initialization code is creating double-precision arrays then converting them to single / integer arrays. You can save some temporary memory usage (and time) by doing:

input = rand(size, 1, 'single');
output = zeros(size, 1, 'int8') - 1;
Ray
Excellent suggestions Ray. Your solution increases the time < 10% while using virtually no memory. I think this will work. Awesome! Thanks so much!!!
ggkmath
Its sort of counter intuitive, but from my experimental results, increasing chunkSize from 1e7 increases the execution time, whereas decreasing it speeds up the execution time (I would have expected the other way around). Still works though.
ggkmath
Nope, decreasing chunkSize too much starts to increase the execution time. chunkSize ~1e6 seems like the sweet spot.
ggkmath

related questions