views:

1648

answers:

4

Can anyone explain in a simple clear way how MPEG4 works to compress data. I'm mostly interested in video. I know there are different standards or parts to it. I'm just looking for the predominant overall compression method, if there is one with MPEG4.

+1  A: 

This link might be helpful

Bharani
I'm sure that the OP could find the standard, they were interested a simple explanation (if there is one).
Ed Swangren
+1  A: 

MPEG4 uses a variety of techniques to compress video.

If you haven't already looked at wikipedia, this would be a good starting point.

There is also this article from the IEEE which explains these techniques in more detail.

Michael
+3  A: 

Like any other popular video codec, MPEG4 uses a variation of discrete cosine transform and a variety of motion-compensation techniques (which you can think of as motion-prediction if that helps) that reduce the amount of data needed for subsequent frames. This page has an overview of what is done by plain MPEG4.

It's not totally dissimilar to the techniques used by JPEG.

greyfade
+6  A: 

MPEG-4 is a huge standard, and employs many techniques to achieve the high compression rates that it is capable of.

In general, video compression is concerned with throwing away as much information as possible whilst having a minimal effect on the viewing experience for an end user. For example, using subsampled YUV instead of RGB cuts the video size in half straight away. This is possible as the human eye is less sensitive to colour than it is to brightness. In YUV, the Y value is brightness, and the U and V values represent colour. Therefore, you can throw away some of the colour information which reduces the file size, without the viewer noticing any difference.

After that, most compression techniques take advantage of 2 redundancies in particular. The first is temporal redundancy and the second is spatial redundancy.

Temporal redundancy notes that successive frames in a video sequence are very similar. Typically a video would be in the order of 20-30 frames per second, and nothing much changes in 1/30 of a second. Take any DVD and pause it, then move it on one frame and note how similar the 2 images are. So, instead of encoding each frame independently, MPEG-4 (and other compression standards) only encode the difference between successive frames (using motion estimation to find the difference between frames)

Spatial redundancy takes advantage of the fact that in general the colour spread across images tends to be quite low frequency. By this I mean that neighbouring pixels tend to have similar colours. For example, in an image of you wearing a red jumper, all of the pixels that represent your jumper would have very similar colour. It is possible to use the DCT to transform the pixel values into the frequency space, where some high frequency information can be thrown away. Then, when the reverse DCT is performed (during decoding), the image is now without the thrown away high-frequency information.

To view the effects of throwing away high frequency information, open MS paint and draw a series of overlapping horizontal and vertical black lines. Save the image as a JPEG (which also uses DCT for compression). Now zoom in on the pattern, notice how the edges of the lines are not as sharp anymore and are kinda blurry. This is because some high frequency information (the transition from black to white) has been thrown away during compression. Read this for an explanation with nice pictures

For further reading, this book is quite good, if a little heavy on the maths.

Lehane
Sharp edges aren't high-frequency, because they don't repeat. They're actually a sum of every frequency at once, which is why DCT compression and some image resizers will add noise around lines instead of blurring them.
alex strange