How do you implement the fastest possible Gaussian blur algorithm ?
I am going to implement it in Java, so GPU solutions are ruled out. My application planetGenesis is cross platform, so I dont want JNI.
How do you implement the fastest possible Gaussian blur algorithm ?
I am going to implement it in Java, so GPU solutions are ruled out. My application planetGenesis is cross platform, so I dont want JNI.
You probably want the box blur, which is much faster. See this link for a great tutorial and some copy & paste C code.
I would consider using CUDA or some other GPU programming toolkit for this, especially if you want to use a larger kernel. Failing that, there's always hand tweaking your loops in assembly.
Math jocks are likely to know this, but for anyone else..
Due to a nice mathematical propertiy of the Gaussian, you can blur a 2D image quickly by first running a 1D Gaussian blur on each row of the image, then run a 1D blur on each column.
Step 1: SIMD 1-dimensional gaussian blur
Step 2: transpose
Step 3: Repeat step 1
(Best done on small blocks, as a full-image transpose is slow, while a small-block transpose can be done extremely fast using a chain of punpck)
You should use the fact that a Gaussian kernel is separable, i. e. you can express a 2D convolution as a combination of two 1D convolutions.
If the filter is large, it may also make sense to use the fact that convolution in the spatial domain is equivalent to multiplication in the frequency (Fourier) domain. This means that you can take the Fourier transform of the image and the filter, multiply the (complex) results, and then take the inverse Fourier transform. The complexity of take the FFT (Fast Fourier Transform) is O(n log n), while the complexity of a convolution is O(n^2). Also, if you need to blur many images with the same filter, you would only need to take the FFT of the filter once.
If you decide to go with using an FFT, the FFTW library www.fftw.org is a good choice.
I found this. Update : This method contains a lot of approximations like using integers and look up tables instead of floats and floating point divisions. I dont know how much speedup that is in modern Java code.
Also these guys have a approximating algorithm using B-splines. This guy claims to have some cool optimizations.
Also, David Everly has a fast method for Gaussian blur processing.
I would try out the various methods, benchmark them and post the results here.
Update: For my purposes, I have implemented/copied from the internet the basic ( process X-Y axis independently ) method and the David Everly method. They differ in parameters, so I couldn't compare them directly. However the latter goes through much less number of iterations for large blur radius. Also, the latter is an approximate algorithm.
For larger blur radiuses, try applying a box blur 3 times. This will approximate a Gaussian blur very well, and be much faster than a true Gaussian blur.
In 1D:
Blurring using almost any kernel repeatedly will tend to a Gaussian kernel. This is what's so cool about the Gaussian distribution, and is why statisticians like it. So choose something that's easy to blur with and apply it several times.
For example, it's easy to blur with a box shaped kernel. First calculate a cumulative sum:
y(i) = y(i-1)+x(i)
then:
blurred(i) = y(i+radius)-y(i-radius)
Repeat several times.
Or you might go back and forth a few times with some variety of IIR filter, these are similarly fast.
In 2D or higher:
Blur in each dimension one after the other, as DarenW said.
This means that you can take the Fourier transform of the image and the filter, multiply the (complex) results, and then take the inverse Fourier transform.
I'm trying to implement this but I don't understand, how to multiply the results. the transformed image I is a 512x512 matrix the transformed kernel K is a 7x7 matrix
How can I multiply I and K ??
I struggled with this problem for my research and tried and interesting method for a fast Gaussian blur. First, as mentioned, it is best to separate the blur into two 1D blurs, but depending on your hardware for the actual calculation of the pixel values, you can actually pre-compute all possible values and store them in a look-up table. In other words, pre-calculate every combination of Gaussian coefficient * input pixel value. Of course you will need to discreetize your coefficients, but just wanted to add this solution. If you have an IEEE subscription, you can read more here: http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5213780
Ultimately, I ended up using CUDA though :)
I'm trying to implement this but I don't understand, how to multiply the results. the transformed image I is a 512x512 matrix the transformed kernel K is a 7x7 matrix
pad the kernel with zeros up to the image size before applying fft.