views:

627

answers:

2

Hi,

I'm interested in image scaling algorithms and have implemented the bilinear and bicubic methods. However, I have heard of the lanczos and other more sophisticated methods for even higher quality image scaling and I am very curious how they work.

Could someone here explain the basic idea behind scaling an image using lanczos (both upscaling and downscaling) and why it results in higher quality?

I do have a background in fourier analysis and have done some signal processing stuff in the past, but not with relation to image processing, so don't be afraid to use terms like "frequency response" and such in your answer :)

EDIT: I guess what i really want to know is the concept and theory behind using a convolution filter for interpolation.

(Note: i have already read the wikipedia article on lanczos resampling but it didn't have nearly enough detail for me)

thanks alot!

+1  A: 

Did this other question help:

http://stackoverflow.com/questions/943781/where-can-i-find-a-good-read-about-bicubic-interpolation-and-lanczos-resampling

Lou Franco
HI, thanks :) that is a good answer that does answer about some of what i wanted to know, but im still curious about why the sinc function was chosen, and in general about the use of convolution filters for interpolation and image scaling. I would like an explanation the explains their use in terms of their frequency response / fourier transform
banister
+8  A: 

The selection of a particular filter for image processing is something of a black art, because the main criterion for judging the result is subjective: in computer graphics, the ultimate question is almost always: "does it look good?". There are a lot of good filters out there, and the choice between the best frequently comes down to a judgement call.

That said, I will go ahead with some theory...


Since you are familiar with Fourier analysis for signal processing, you don't really need to know much more to apply it to image processing -- all the filters of immediate interest are "separable", which basically means you can apply them independently in the x and y directions. This reduces the problem of resampling a (2-D) image to the problem of resampling a (1-D) signal. Instead of a function of time (t), your signal is a function of one of the coordinate axes (say, x); everything else is exactly the same.

Ultimately, the reason you need to use a filter at all is to avoid aliasing: if you are reducing the resolution, you need to filter out high-frequency original data that the new, lower resolution doesn't support, or it will be added to unrelated frequencies instead.

So. While you're filtering out unwanted frequencies from the original, you want to preserve as much of the original signal as you can. Also, you don't want to distort the signal you do preserve. Finally, you want to extinguish the unwanted frequencies as completely as possible. This means -- in theory -- that a good filter should be a "box" function: with zero response for frequencies above the cutoff, unity response for frequencies below the cutoff, and a step function in between. And, in theory, this response is achievable: as you may know, a straight sinc filter will give you exactly that.


There are two problems with this. First, a straight sinc filter is unbounded, and doesn't drop off very fast; this means that doing a straightforward convolution will be very slow. Rather than direct convolution, it is faster to use an FFT and do the filtering in frequency space...

However, if you actually do use a straight sinc filter, the problem is that it doesn't actually look very good! As the related question says, perceptually there are ringing artifacts, and practically there is no completely satisfactory way to deal with the negative values that result from "undershoot".

Finally, then: one way to deal with the problem is to start out with a sinc filter (for its good theoretical properties), and tweak it until you have something that also solves your other problems. Specifically, this will get you something like the Lanczos filter:

Lanczos filter:       L(x)     = sinc(pi x) sinc(pi x/a) box(|x|<a)
frequency response: F[L(x)](f) = box(|f|<1/2) * box(|f|<1/2a) * sinc(2 pi f a)

    [note that "*" here is convolution, not multiplication]
       [also, I am ignoring normalization completely...]
  • the sinc(pi x) determines the overall shape of the frequency response (for larger a, the frequency response looks more and more like a box function)
  • the box(|x|<a) gives it finite support, so you can use direct convolution
  • the sinc(pi x/a) smooths out the edges of the box and (consequently? equivalently?) greatly improves the rejection of undesirable high frequencies
  • the last two factors ("the window") also tone down the ringing; they make a vast improvement in both the perceptual artifact and the practical incidence of "undershoot" -- though without completely eliminating them

Please note that there is no magic about any of this. There are a wide variety of windows available, which work just about as well. Also, for a=1 and 2, the frequency response does not look much like a step function. However, I hope this answers your question "why sinc", and gives you some idea about frequency responses and so forth.

comingstorm
a great and detailed answer :) thankyou very much :))
banister