views:

673

answers:

13

Hello, I'm currently developing application using DirectSound for communication on an intranet. I've had working solution using UDP but then my boss told me he wants to use TCP/IP for some reason. I've tried to implement it in pretty much the same way as UDP, but with very little success. What I get is basically just noise. 20% of it is the recorded sound and the rest is just weird noise.

My guess for the reason is that TCP needs to read all the accepted data several times until it gets the final sound I can play.

Now two questions:

  • Am I on the right tracks? Is it even good idea to use TCP/IP for this kind of application (voice conferencing of sorts)?
  • I'm doing it in C# but I don't think this is language specific.
+1  A: 

TCP/IP would work; it will deliver the data. It might not be quite as efficient as UDP if you were not worrying about packet loss, but you should be able to transmit the data just fine.

Mark Wilkins
+12  A: 

No, using TCP is a terrible idea. UDP in this case will perform much better and dropped / out of sync packets won't matter!

If your boss can't understand the technical details, tell him or her that virtually all VOIP systems currently existing use UDP and there must be a reason: Skype, ventrilo, teamspeak, World of Warcraft's, etc

Andreas Bonini
TCP/IP is not the best option for VoIP, but it won't hurt. It's just not as efficient.
Jon Benedicto
Not as efficient = The quality will be worse = It does hurt, imo
Andreas Bonini
You can always configure TCP to do better than a NIH over UDP. Unless, of course, one would insist in having packet loss.
Pavel Radzivilovsky
Not as efficient != The quality will be worse. The quality is not worse, providing the network has enough bandwidth. Just about every single network has more than enough bandwidth for the overhead of TCP/IP for VoIP.
Jon Benedicto
due to retransmissions and packets only being delivered if all packets are in, voip over tcp/ip will cause delays, unless you add a significant buffer. But this is really unwanted for VOIP where one wants low-latency
Toad
TCP goes to great lengths to give you *all* the data, even if that means that you get some later than you otherwise would. But in VOIP, late data is quite simply useless.
caf
+1  A: 

TCP/IP over modern routers and networks is very fast. It is more than capable of handling voice over IP communication. (I've done it myself)

My guess is that your implementation has some bugs in it related to buffer sizes.

Jon Benedicto
Yes, I'm pretty sure it's my implementation that's causing this. I just wanted to know if I should spend another day fixing it.
Micha
+1  A: 

There are a few main reasons why live streaming data uses UDP. The biggest of which is receiving late data is as good as not receiving it at all, and delaying the stream for retransmission is certainly not a good idea. For VoIP, you have a latency tolerance of somewhere around 150ms. Any voice packet that's delayed longer than that becomes noticeable for users.

As for why you are getting noise, how are you handling late arriving packets due to retransmits?

yx
+3  A: 

When people are talking about the TCP/IP stack they often mean "the whole Internet protocol stack" which includes UDP. Maybe that makes your manager happy ;-)

johannes
my thoughts too. if your boss don't give any reasons why it must be TCP, i'd think that he believed that UDP is 'some kind of nonstandard hack' Especially if he says TCP/IP instead of UDP.
Javier
He told me "I've talked to world class network expert. He says that we'll be losing one third of packets with UDP." I don't know who that 'expert' might be but well, whatever
Micha
with voice it's better to loose some packages than having a hell lot of latency
johannes
should the boss tell the "network expert" that you were developing a VoIP software, the "network expert" would have told that UDP was better and RTP was the way to go... there is something fishy about that "network expert" !
Adrien Plisson
A: 

Depends on the kind of underlying network, if you have Ethernet with 99.9% reliability, my guess is TCP would do just fine. However if you are doing it over say 802.11 then TCP would be a not so good idea.

You can ask your boss for a specific reason to use TCP and then implement that particular service for example basic reliability, or an error correction service over UDP. You might also like to look into RTP.(http://en.wikipedia.org/wiki/Real-time%5FTransport%5FProtocol)

anijhaw
A: 

TCP should not introduce any noise. Jitter and lag, yes (especially if your links are lossy); but no noise at all. Something is fishy with your programming.

BTW, I concur that UDP is far more appropriate than TCP in this case.

Javier
A: 

Most voice application are build using the RTP protocol which is stream over UDP port. Well most of them with codec support to ensure the media are compressed before stream from one end to another. Discuss with your boss about the bandwidth requirements.

A: 

I'm pretty sure most streaming audio/video uses UDP...you might lose a few packets, but you would never notice.

smoore
+1  A: 

There is no reason why you should be getting noise over TCP and it therefore looks like a bug in your code. In fact most streaming media we receive (think YouTube) are done over TCP.

The problem with TCP is jitter. Delivery of your data stream will be delayed until all of the packets have been received and reordered. Now since late delivery for multimedia is as good as no delivery at all. This is normally a poorer choice than simply interpolating the missing frame. As mentioned above, if packet loss is minimal and your network fast, it should make no difference.

RTP/RTCP over UDP is normally used for delivery of the media stream. RTP includes things like sequence numbers in the packet header that allow for insertion of late packets into their correct position, where possible. RTCP has a reporting function that allows the codec to adapt to situaltions where packet loss starts to become higher. RTP/RTCP therefore provides some but not all TCP functionality.

For streaming media over TCP, this can be solved easily by having a large jitter buffer. This adds latency but for one-way streaming this is not a problem. Latency, however is a major problem in two-way-conversational streaming.

One main advantage to TCP, though, is that it traverses firewalls more easily than UDP. One a TCP session is established the firewall is open both to sent and receive data. This is more complicated for UDP especially when one is expecting an incoming stream of data. There are ways round this but they can be complicated and may involve understanding the session control protocol (like SIP or RTSP).

doron
+3  A: 

To answer this question correctly I feel that some of the key concepts of VoIP need to be explained.

Firstly, UDP is the most popular and widely used method for VoIP. Remember that an IP network is packet switched and ideal for non-real-time data communication and not designed for real-time VoIP.

To overcome this problem UDP is used. UDP is unreliable and connectionless protocol. Although UDP will lose packets the speech audio can still be understood, the brain will effectively compensate for the errors. Thats why you can still speak to someone on a phone with a 3 bars of signal.

Packet Loss and Burst Lengths

Packet loss often occurs due to congestion, so the amount of packet loss will depend on how well equiped the network is. Packet loss in VoIP using UDP will most often occur in burst lengths. A burst length is a the number of packets lost in succession in transmission, so a burst length of 3 means 3 packets in a row were lost.

Packet Loss Compensation

Where packet loss occurs simple packet loss compensation techniques will surfice and the Quality of Service will not be seriously effected, speech can still be understood even in cases where 20-30% of packets are lost. Methods include:

  1. Repeat the last successfully received packet.

  2. Fill in - Play silence in the gap.

  3. Splicing - Effectively this can be thought of taking removing the gap caused by the burst length by pushing the start and end of the gap together.

  4. Interpolation - Use knowledge of speech before and after to interpolate lost packets within the gap e.g. mean between the packets successfully recieved before and after the burst length.

A good method of reducing size of burst lengths is known as interleaving and thus increasing QoS is interleaving. A block interleave function takes the speech and splits it into a set of packets. These packets are loaded into a buffer the shape of a matrix (e.g. 4 by 4), a function is used rotate or transpose the buffer so the packets are not in order. On the reciever side the inverse of this function is used to re-order the packets. This method is simple and effective, See the figure below:

alt text

I recently created a small VoIP app. over a wireless LAN using UDP. I am not really sure of the exact requirements of your application but generally VoIP applications (between two hosts) can implemented as follows:

alt text

In the diagram the application defines it's own packet design. The header could just be the packet number (using 1 byte) and the payload the audio data (n bytes, size of payload). Defining this allows better packet compensation techniques and allows for a logical flow for programming.

TCP is a bad choice for VoIP for several reasons. A quick google of 'TCP VoIP' reveals why the first result backing this view.

TCP is a reliable, connection-orrientated protocol, this means that packets which are lost in transmission will at some point be resent from the other host. This retransmission is impractical for real-time services and will increase jitter, latency and possibly increase packet loss (in some cases).

Answers to Your Questions

What I get is basically just noise. 20% of it is the recorded sound and the rest is just weird noise.

TCP should not introduce noise, it should introduce jitter and latency. Sockets tend to have an automatically defined time-out time, do you define the time-out time? If not what happens why you do not recieve the correct packet in time before playback?

Am I on the right tracks? Is it even good idea to use TCP/IP for this kind of application (voice conferencing of sorts)?

No do NOT use TCP/IP it is not a good idea. It appears that your manager has incorrectly assumed that any packet loss is a terrible thing.

Summary

Some general key concepts have been shown here to try and help as much as possible for this specific problem, however this should not be considered exhaustive. Make sure the VoIP system also uses some underlying principles of speech coding/signal processing techniques.

The key points to remember are:

  • Use UDP for VoIP.

  • Implement packet loss compensation
    techniques.

  • A block interleaver is a simple and
    effective method to increase QoS.

I hope this helps.

Graham
A: 

If you're getting noise, you're probably overrunning the part of your buffer that has successfully filled with packets, and playing empty/uninitialized buffer.

Dean J
A: 

I have developed a voice oper ip solution for a duplex comunication with wave-api for the remote control of a amateur radio tranceiver. It works verry well with UDP and also with TCI/IP! I use 512 byte buffer each 64 ms, 8kHz Mono wave data. I have work in the last month between usa and europa verry well over TCP/IP! Now my question: The wave-api do not work correct with Win7, therefore I think DirectSound its the better way. Just in tim I have trubble wit the implementation under Managed DirectX9, my application is VB.Net 2008. I search links to documentation for a streaming output with DirectSound - ManagedDirectX9 for VB.Net.

roland