I have a bunch of RTP packets that I'd like to re-assemble into an audio stream. For each packet, I have the sequence number, SSRC, timestamp, and a byte array representing the data itself.
Currently I'm taking each subset of packets by their SSRC, then ordering them by timestamp and combining the byte arrays in that order. Afterwards, I'm mixing the byte arrays. The resulting audio data sounds great (by great, I mean everything is in time), but I'm worried that it's due to not having much packet loss.
So, a couple questions...
For missing packets, a missing sequence number shows where I need to add a bit of empty audio. I believe the sequence number "wraps around" quite often, so I need to use timestamp to break them up into subsets. Then I can look for missing sequence numbers in those subsets and add as needed. Does that sound like the right thing to do?
I haven't quite figured out what else the timestamp is good for. Since I'm recording already existing packets and filling in the missing ones, maybe I don't need to worry about this as much?