views:

102

answers:

1

I'm trying to understand the effect of the Block Size and best strategy of choosing the Coefficients in DCT compression. Basically I want to ask what I wrote here:

http://stackoverflow.com/questions/4582/video-compression-what-is-discrete-cosine-transform/1948138#1948138

Lets assume the most primitive compression. Making block of an image. Performing a DCT on each blog and zeroing out some coefficients.

To my understanding, the smaller the block the better. Smaller blocks means the Pixels are more correlated hence the energy in the DCT spectrum is more "Compact". It should be more emphasized in a fast varying images (High Frequency).

Let's say we zero out a certain percent of the coefficients, what would result in best image quality, small or large blocks? Let's say we keep, 10%, 25%, 50%, 75%, would you say it's a different answer for a different percentage?

Another issue is how to chose the coefficients you leave untouched. Lest's say I have to make a decision based on location and not energy. Would you take a square from the top left corner? I've averaged many block in the DCT spectrum and concluded the best would be taking a triangle from the top left corner. What do you think?

Hopefully we'll have effective discussion.

+1  A: 

The essence of your question seems to be about image quality. There has been a considerable literature produced on the subject, and the result is that image quality is a hard thing to determine.

Standard mathematical error measures like the signal-to-noise ratio (SNR) and mean-squared error (MSE) can give a quantitative answer, but it is well known that these don’t correlate well with subjective viewer opinions, which must be our final authority. No other methods, even those founded on psycho-visual models of the viewer (e.g., S.A. Karunasekera and N.G. Kingsbury, “A distortion measure for blocking artifacts in images based on human visual sensitivity”, IEEE Trans. on Image Proc. vol. 4, no. 6, June 1995, pp. 713 –724; and M. Miyahara, K. Kotani, and V. R. Algazi, “Objective picture quality scale (PQS) for image coding,” IEEE Trans. on Comm. vol. 46, no. 9, Sept. 1998, pp. 1215 –1226), have proven themselves to be better than SNR.

Moreover, when you vary the type of imagery (line drawing, cartoon, photo, portrait, etc.), certain types of compression distortion become more evident. Mosquito noise might be objectionable in one image, while staircase noise might be the culprit in another.

In short, there is no pat answer to your question, "what would result in best image quality?"

That being said, we can say some things about the DCT that are of relevance. The pixels in a DCT of a block go from low variation to high variation in a zig-zag pattern from the top left corner [(0,0)->(0,1)->(1,0)->(2,0)->(1,1)->(0,2)->etc.], as your triangle selection mirrors. The closer a pixel is to the top left corner, the smoother the information contained therein [in fact, the (0,0) DCT value is the average of the whole block], and the farther away from that corner you get, the more "high frequency" details you'll get. The closer to the top and left of the image, the more horizontal and vertical details you'll have represented by that DCT coefficient, and the closer to the diagonal of the block, the more diagonal details you'll have.

In brief, lossy compression usually entails throwing away some of the "details" that may not be perceptible to the eye. (Throwing away the "smoother" DCT values results in severe distortion.) The more DCT values you throw away, the greater your compression ratio will be, but also the greater distortion you'll induce.

As for block size, it all depends. The more variance and detail there is in a block, the more you'll lose by throwing away coefficients. Some compression algorithms adaptively use different block sizes within the same image so that high-detail regions receive more and smaller blocks and smooth regions receive fewer and larger blocks.

For algorithms that use a single block size, 8x8, 16x16, and 32x32 are common for things like JPEG and MPEG. The processing required to compress them will be smaller than an adaptive block size, but the quality will also be lower in general.

mlimber
Thanks for the thorough answer.Of course, in my question I assumed single block size.I don't care about the SNR / MSE. I just want to have the theory understood.As far as I see it, the smaller the block are the better - because smaller block, usually, means more correlated pixels which are much easier to compress (Or let's say, are better defined by fewer coefficients). Am I right with those assumptions?Just to smooth the theory and then understand what's the specific case for a specific image.
Drazick
Generally speaking, smaller blocks are better as far as correlation, but if they're too small your compression ratio will suffer because you won't be able to throw away many coefficients from each block without causing severe distortion.
mlimber