views:

431

answers:

2

Hello,

This is not a "programming" question. But I'm sure it's something that is widely known and understood in this community.

I have an image, x, and a much smaller image, y, and I need to convolve the two by multiplying their FFTs. But since they are not the same size I don't know how to do the frequency domain multiplication.

I take the (two-dimensional) FFT of x (which is an integer matrix of dimensions 4096 x 4096), which gives me the frequency domain representation, X (which is a matrix of complex numbers and I think it's dimension is 2048 x 2048).

Similarly, I take the (two-dimensional FFT of y (which is an integer matrix of dimension 64 x 64), which gives me the frequency domain representation, Y (which is also a matrix of complex numbers and I think it's dimension is 32 x 32).

I'm using the fourn function in Numerical Recipes, so my input matrices, x and y must be collapsed into one-dimensional arrays, which get replaced by their discrete Fourier transforms, X and Y. The point being that even though this is a two-dimensional problem with images, I am working with one-dimensional arrays.

If I were trying to convolve two images of the exact same dimensions, x and y. It would all be very straightforward:

 X = FFT(x)

 Y = FFT(y)

 Z = X * Y (term by term multiplication)

 Convolution of x and y = IFFT(Z)

But if X and Y are different lengths, how do I do the multiplication?

One possibility is to pad out y to have the same dimensions as x. But this seems horribly inefficient. Another possibility is to pad out Y to have the same dimensions as X. But I don't know what this means in frequency space.

Here's another way of asking this question: If I want to convolve two images of very different dimensions using FFTs so I can do multiplication of their spectra (frequency domain representation), how do I do that multiplication?

Thanks,

~Michael.

+3  A: 

Padding the smaller array (the convolution kernel, y in your case) with zeroes to match the input image size (your matrix x) is the standard approach. That would be horribly inefficient if you were doing the convolution in the spatial domain, but if you're multiplying the FFTs, it's necessary, and the cost of computing the FFT of the padded array isn't too bad.

Jim Lewis
+1 (awhile ago)... it might also be worth mentioning that padding the image can also be useful. Since the FFT assumes a periodic image, the convolution can mix the edges together.
tom10
+3  A: 

You are correct in thinking the two frequency spacings need to be the same. Take a 1D example (I'm using Matlab syntax):

N = 4096;
M = 64;

x = randn(N, 1);
y = hann(M, 'symmetric');

zLinear = conv(x,y);
zCircular = ifft( fft(x) .* fft(y,N) );

disp(max(abs(zLinear(65:4096) - zCircular(65:4096))));

The difference between the two techniques is ~2e-14, so roundoff error. Note that you have to skip the first 64 samples due to the difference between linear and circular convolution.

In the calculation of zCircular, note fft(y,N), which is Matlab syntax for padding out the y signal with zeros up to N before calculating the fft. This might be considered inefficient in memory usage, but compare the speed:

linear convolution: 4096 multiply/adds of 64 each = 262144 multiply/adds

circular convolution: 2 FFTs of 4096 + 1 complex multiply of 2 * 4096 elements + 1 inverse FFT
= 3 * 4096 * log2(4096) + 4096 * 6 = 172032 (assuming 6 operations for the complex multiply)

Basically, the NlogN speed of the FFT, even when you need three of them, beats the N * M convolution operation unless M is very short.

EDIT Add speed estimate for 2D case

It's worth adding that for 2D data the speed advantage is magnified. A 2D FFT takes N * N * log2(N * N) operations, so the 3 FFTs + complex N^2 array multiply for N = 4096 is 1.3e10 operations. But the direct convolution is N^2 * M^2 = 6.9e10 operations, some 50 times slower.

mtrw