tags:

views:

14

answers:

1

Hi everyone,

I'm working on a source filter which feeds video/audio captured by our software through a DirectShow graph. I got the video working relatively painlessly, but now trying to add an audio output pin is proving to be quite a challenge.

The specific question I have is does audio renderer modify the actual reference clock as it is playing sound. I'm seeing very jerky video playback. Attached below is a chunk of a log file and it looks like once in a while reference clock just "stops" while system time keeps ticking. Does that make sense?

One thing I should mention is that audio samples are u-Law 8khz 8-bit and each packet is exactly 120ms. Here's the complication, when we receive audio data from the network, it doesn't come with time information, so our software assigns a sample timestamp at the time that the packet was received. Video samples get stamped by the original source, so they are accurate. If I ignore audio sample times and simply assign sample timestamps 120ms apart, video will play smoothly. The problem is that I'm still not fully understanding the complete relationship between reference clock and audio/video renders and what really puzzles me is that we have another similar source filter which plays the same data without jerking video (it doesn't have logging and I didn't get a chance to add any to see if reference clock is also modified in that case).

-- Dennis

This is that piece of the log:
Sys Clock(delta): StreamTime(delta): Drift between clocks:
15:54:40.755(0.005) 1.838 (0.005) 0.000
15:54:40.761(0.006) 1.844 (0.006) 0.000
15:54:40.889(0.128) 1.972 (0.128) 0.000
15:54:40.894(0.005) 1.977 (0.005) 0.000
15:54:40.899(0.005) 1.982 (0.005) 0.000
15:54:40.903(0.004) 1.986 (0.004) 0.000
15:54:40.931(0.028) 2.014 (0.028) 0.000
15:54:40.936(0.005) 2.019 (0.005) 0.000
15:54:41.019(0.083) 2.080 (0.061) 0.022
15:54:41.175(0.156) 2.080 (0.000) 0.178
15:54:41.181(0.006) 2.080 (0.000) 0.184
15:54:41.190(0.009) 2.080 (0.000) 0.193
15:54:41.197(0.007) 2.080 (0.000) 0.200
15:54:41.202(0.005) 2.080 (0.000) 0.205
15:54:41.210(0.008) 2.080 (0.000) 0.213
15:54:41.216(0.006) 2.080 (0.000) 0.219
15:54:41.220(0.004) 2.080 (0.000) 0.223
15:54:41.313(0.093) 2.080 (0.000) 0.316
15:54:41.317(0.004) 2.080 (0.000) 0.320
15:54:41.408(0.091) 2.116 (0.036) 0.375
15:54:41.412(0.004) 2.120 (0.004) 0.375
15:54:41.432(0.020) 2.140 (0.020) 0.375
15:54:41.436(0.004) 2.144 (0.004) 0.375
15:54:41.439(0.003) 2.147 (0.003) 0.375

A: 

When a sound card is in the graph it is usually selected as reference clock. Other filters, including video renderer, use it to determine when to show their samples. Using system clock in parallel is not a good idea, you should use the same reference clock to be in sync.

If you know the real length of your audio samples and you're sure you don't lose any of them (e.g. you use TCP, not UDP) then just assigning sequential 120ms time intervals is a good solution. Taking timestamps from system clock when a sample arrives from network is a bad idea because it will introduce random time shifts caused by the network behavior - you never really know how long will it take for a network packet to come.

If you have two filters and want to see how their timing is different you can install GraphEditPlus, insert a sample grabber before/after your filters, right click it and select "watch grabbed samples". It will show all the timestamps and other info. Also you can right click the graph window and choose "see event log", it can also help.

Dee Mon
thank you for the reply. However, this doesn't actually answer what I'm trying to find out, which is "does audio renderer modify the actual reference clock as it is playing sound."
DXM
For now, I am just trying to gain understanding of DirectShow and its behavior. What would it do if audio samples of a specific duration had reference time intervals which didn't match that duration? Something has to give, but unless audio is re-sampled, its playout duration will never change. So would the audio renderer in this case adjust the reference clock value to match the reference time of the sample being played (which would then obviously change video renderer playout)? This is kind of what I'm seeing but for now only with one of the two source filters I'm working with.
DXM
The solution you mentioned, is one of the approaches I am considering, but in my specific scenario is a bit more complicated because when my software gathers video/audio data from different devices, each device has it's own clock which never runs at the same rate as the PC. This is why everything needs to be normalized against system clock (in a way that doesn't introduce jitter), so I'll probably end up having to resample the audio stream (or introduce something similar to what VoIP uses to stretch/shorten the streams).
DXM
"What would it do if audio samples of a specific duration had reference time intervals which didn't match that duration?" - I suppose audio renderer will just ignore end time of the timestamp and use only start time and actual duration. But this needs to be tested.
Dee Mon