tags:

views:

71

answers:

3

I have a bunch of RTP packets that I'd like to re-assemble into an audio stream. For each packet, I have the sequence number, SSRC, timestamp, and a byte array representing the data itself.

Currently I'm taking each subset of packets by their SSRC, then ordering them by timestamp and combining the byte arrays in that order. Afterwards, I'm mixing the byte arrays. The resulting audio data sounds great (by great, I mean everything is in time), but I'm worried that it's due to not having much packet loss.

So, a couple questions...

  1. For missing packets, a missing sequence number shows where I need to add a bit of empty audio. I believe the sequence number "wraps around" quite often, so I need to use timestamp to break them up into subsets. Then I can look for missing sequence numbers in those subsets and add as needed. Does that sound like the right thing to do?

  2. I haven't quite figured out what else the timestamp is good for. Since I'm recording already existing packets and filling in the missing ones, maybe I don't need to worry about this as much?

+1  A: 

1) I don't think sequence number "wrap around" quickly. This is 16-bit value so it wraps every 65536 messages and even if message is send every 10 milliseconds this give more than 10 minutes of transmission. It is very unlikely that packet will be lost for so long. So in my opinion you should only check sequence number, checking timestamp is pointless.

2) I think you shouldn't worry much about timestamp. I know that some protocols didn't even fill this value and relay only on sequence number.

Zuljin
There will definitely be more than 10 minutes of transmission, so wouldn't I have multiple packets with the same sequence number, but different timestamps?
Dan
Could you please explain in which form you have these packets? Because if packages are completely unsorted then I think your algorithm using timestamp is OK. However if packages are more or less sorted by time of arrival (for example packages came from Wireshark/tcpdump traffic dump) then in my opinion it is pointless to check timestamp. You also didn't mention if you are doing this in real-time (so you receive stream from socket and want to save it or play it as fast as possible) or this is post-processing. In real-time the algorithm should be different.
Zuljin
I'm capturing packets with winpcap. I have two functionalities for recording. The first is at some point during the stream, the user can click record. At a later point, they click stop recording and at this point, it saves packets received between the two events. The second way I'm recording is at the end of the stream, all audio data is assembled and saved to a file.
Dan
+1  A: 

1) Avoid using timestamps in your algorithm. Your algorithm will fail in case you are receiving stream from bad clients (Improper timestamps). And "timestamps increment" value changes with codec types. In that case you may need different subsets for different codecs. There is no limitations on sequence number. Sequence number are incremented monotonically. Using sequence number you can track lost packets easily.

2) Timestamp is used for synchronization between Audio and video. Mainly for lip sync. A relationship between audio and video timestamps is established for achieving synchronization. In your case its only audio so you can avoid using timestamp.

alam
2) sequence number increases monotonically, timestamp does not have to. For example in h.264 codec video frames depend on future frames which is sent before. I am not aware of audio codec which uses similar approach,
Boris
A: 

I think what Zulijn is getting at in his answer above is that if your packets are stored in the order they were captured then you can use some simple rules to find out-of-order packets - e.g. look back 50 packets and forward 50 packets. If it is not there then it counts as a lost packet.

This should avoid any issues with the sequence number having wrapped around. To handle any lost packets there are many techniques you can use, so it would be useful to google 'Audio packet loss' or 'VOIP packet loss concealment'. As Adam mentions timestamp will vary with codec so you need to understand this if you are going to use it.

You don't mention what the actual application is but if you are trying to understand what the received audio actually sounded like, you really need some more info, in particular the jitter buffer size - this effectively determines how long the receiver will wait for an out of sequence packet before deciding it is lost. What this means to you is that there may be out-of-sequence packets in your file which the 'real world' receiver would have given up and not played back - i.e. your reconstruction from the file may give a higher quality than the 'real time' experience.

If it is a two way transmission, then delay is very important also (even if it is a constant delay and hence does not affect jitter and packet loss). This is the type of effect you used to get on some radio telephones and still do on some satellite phones (or VoIP phones), and it can significantly impact the user experience.

Finally, different codecs and clients may apply different techniques to correct lost packets, insert 'silent tones' for any gaps in the audio (e.g. pauses in conversation), suppress background noise etc.

To get a proper feel for the user experience you would have to try to 'replay' your captured packets as accurately as possible using the same codec, jitter buffer and any error correction/packet loss techniques the receiver used also.

Mick
I think that helps a bit. I'll try it.
Dan