views:

50

answers:

2

Is it useful to study the H.261 specification for an introduction into modern video compression technology, or should I start somewhere else? I'm not sure where to start, but H.261 seems simple enough to make it easy to grasp the concepts.

+1  A: 

The specification isn't a very good introduction -- it's written primarily to be precise, and contains little explanation about why things are the way they are. H.261 is essentially the same as MPEG-1. One book I've used (and find quite well written) is MPEG Video compression stanadard, by Mitchell, Pennebaker, Fogg and LeGall. FWIW, this covers both MPEG-1 and MPEG-2 (aka h.261 and h.262 respectively).

Jerry Coffin
+1  A: 

I partially agree with Jerry Coffin; I think H.261 is definitely a good starting point for anyone learning about video compression, but reading the specification directly is not a good idea.

The basic building blocks from H.261 that I would focus on are motion compensation, macroblocks, DCT to reduce spatial redundancy, and differential PCM (DPCM) to reduce temporal redundancy.

If I had to choose one general principle of video compression for learning purposes, start with motion estimation and motion compensation. Try this thought exercise: imagine two consecutive video frames separated by only 1/30 of a second. They will be pretty similar, right? Without peeking at the Internet, what would you do to exploit the information encoded in frame 1 to reduce the code length of frame 2? Now, go search for motion estimation.

Next, how would you reduce spatial redundancy? H.261 uses something like JPEG and uses the DCT.

Edit: From Wang, Osterman, and Zhang (p.293-4 on block-based hybrid video coding which H.261 essentially is):

In this coder, each video frame is divided into blocks of a fixed size and each block is processed more or less independently, hence the designation "block-based." The word "hybrid" means that each block is coded using a combination of motion-compensated temporal prediction and transform coding. ... First a block is predicted from a previously coded reference frame using block-based motion estimation. The motion vector specifics the displacement between the current block and the best matching bock. The predicted block is obtained from the previous frame on the estimated MV using motion compensation. Then, the prediction error block is coded, by transforming it using the DCT, quantizing the DCT coefficients, and converting them into binary codewords using variable-length coding.

Steve