tags:

views:

1630

answers:

4

I'm working with an mpeg stream that uses a IBBP... GOP sequence. The (DTS,PTS) values returned for the first 4 AVPackets are as follows: I=(0,3) B=(1,1) B=(2,2) P=(3,6)

The PTS on the I frame looks like it is legit, but then the PTS on the B frames cannot be right, since the B frames shouldn't be displayed before the I frame as their PTS values indicate. I've also tried decoding the packets and using the pts value in the resulting AVFrame, put that PTS is always set to zero.

Is there any way to get an accurate PTS out of ffmpeg? If not, what's the best way to sync audio then?

A: 

Ok, scratch my previous confused reply.

For a IBBPBBI movie, you'd expect the PTSes to look like this (in decoding order)

0, 3, 1, 2, 6, 4, 5, ...

corresponding to the frames

I, P, B, B, I, B, B, ...

So you appear to be missing an I at the start of your sequence but otherwise the timestamps look correct.

A: 

Right, that's the way it should look, but the actual frames are in IBBP order. avcodec_decode_video() doesn't return a got_picture == 1 until I feed it the fourth P frame, which makes sense when the frames are in IBBP order instead of IPBB order.

I've also turned on debug logging and I've looked at the AVFrame.pict_type to verify that the first frame is indeed an I frame.

I've since found http://www.dranger.com/ffmpeg/tutorial05.html, which makes a complete hack out of calculating PTS, so it's looking hopeless.

hobb0001
+1  A: 

I think I finally figured out what's going on based on a comment made in http://www.dranger.com/ffmpeg/tutorial05.html:

ffmpeg reorders the packets so that the DTS of the packet being processed by avcodec_decode_video() will always be the same as the PTS of the frame it returns

Translation: If I feed a packet into avcodec_decode_video() that has a PTS of 12, avcodec_decode_video() will not return the decoded frame contained in that packet until I feed it a later packet that has a DTS of 12. If the packet's PTS is the same as its DTS, then the packet given is the same as the frame returned. If the packet's PTS is 2 frames later than its DTS, then avcodec_decode_video() will delay the frame and not return it until I provide 2 more packets.

Based on this behavior, I'm guessing that av_read_frame() is maybe reordering the packets from IPBB to IBBP so that avcodec_decode_video() only has to buffer the P frames for 3 frames instead of 5. For example, the difference between the input and the output of the P frame with this ordering is 3 (6 - 3):

|                  I B B P B B P
|             DTS: 0 1 2 3 4 5 6
| decode() result:       I B B P

vs. a difference of 5 with the standard ordering (6 - 1):

|                  I P B B P B B
|             DTS: 0 1 2 3 4 5 6
| decode() result:       I B B P

<shrug/> but that is pure conjecture.

hobb0001
A: 

I'm fairly certain you are getting accurate values. It might help if you thing of an MPEG stream as, well, a stream. In that case, prior to the IBBPBB that you see there would normally be another GOP. Maybe something like this (using same notation as original question):

P(-3,-2)  B(-2,-1)  B(-1,0)

Basically the B frames after the I frames are based on the I frame and the last P frame from the previous GOP.

While it makes logical sense for a video to start off with this:

Start GOP: IPBBPBBPBB...

Later on it must be

Start GOP: IBBPBBPBBPBB
Start GOP: IBBPBBPBBPBB
Start GOP: IBB...

Remember that decoding any B frame requires a complete frame before it and after it. So each pair of B frames should be displayed before the I or P frame just prior to it in the file.

FFMPEG may just have forgone the "special case" of first GOP.

Since the first two B frames don't have a prior frame to manipulate, you should be able to safely discard them. Just rebase your timestamps off of the first I frame and adjust the audio stream the same amount.

Whether this will actually result in a loss of frames will depend on FFMPEG's implementation, but worse case scenario is that you lose 83 milliseconds (2 frames at 24 frames/sec).

Jere.Jones