Basically , this is how its achieved.
1) encode video / audio using the best compression you can get. Go lossy compression and plenty of aliasing to throw away portions of video and audio which is not usable. Like removing background hiss
2) pack video / audio into packets and put a timestamp on them. The packets are usually datagrams.
3) send packets directly to destination. Use the most appropriate route. You dont have to send all packets the same way. Use many routes if possible. P2P networks often use many routes to the same destination
4) re-encode on the destination. If a packet is too old , throw it away. If packets are lost , dont bother about it since its too late.
5) join the video back and fill in the missing frames the best you can.