tags:

views:

743

answers:

2

Hi,

I'm working on a UDP Multicast library and got a question on how to properly handle link failures, disconnected/reconnected NIC cables, etc.

In my test I have the following setup:

  • 2 servers sA and sB
  • sA is sending UDP multicast data and sB is receiving multicast data
  • servers are connected through a Layer 2 Cisco gigabit switch

As an example, when I join the multicast group on sB I start receiving data on that socket from sA's multicast packets.

Now, when I disable/unplug the NIC to which the multicast receiver sB is bound, I'm not receiving any socket level errors (e.g. in Socket.ReceiveAsync), which I guess is expected as UDP is connectionless, yet I was hoping I would get some kind of notification/exception as the IP that the multicast receiver is bound to becomes unavailable.

Anyways, when I reenable that NIC, I'm not receiving any more data although the sender is still sending on the same multicast group. I was hoping that the Kernel would actually handle rejoining the multicast group after a hardware link failure but looks like it doesn't. However, since I'm not getting any socket level errors either, I don't really know how to detect a link failure for a multicast receiver? Are there certain socket options that need to be set so the kernel would rejoin a multicast group? The only option I came up with so far is listening for System.Net.NetworkInformation.NetworkChange.NetworkAddressChanged events and attempt to rebind when I get a notification that the local IP I have to bind to becomes available again. How are other multicast applications handling that scenario?

Thanks,

Tom

A: 

I cant go into detail because its a company secret how my company protocols work, but program in a heart beat periodically between your server and your clients. Your software then can internally time when the last heart beat arrived and deduce if you have suffered some sort of network / hardware failure.

There are plenty of options you can play around with to try to detect a failure occured including checking NetworkAddressChanged, but it will be safer to implement the heart beat because its a generic solution which is easy to implement and should cover nearly all cases.

Andrew Keith
Well, my application level protocol is sending/consuming heartbeats but that doesn't really help in my question's scenario.When the receiver application is not receiving any heartbeats then there are multiple scenarios that could have happened:- The sender application might simply have gone dead- An intermediary switch in a multi switched network might have gone dead- Switch overload, etc. causing excessive packet loss- etc. to be continued in next comment...
Tom Frey
In none of these cases would the receiver have any control over what action to take, as rejoining the multicast socket would cause an exception as the socket still has a valid binding and if any of the above issues would get resolved, the receiver would receive data again without having to take any action.However, when there is a physical link break, etc. then the corrective course of action would have to be different and from my testing the socket doesn't rejoin the multicast group automatically. So unless I'm missing something here, heartbeats don't really take care of that scenario.
Tom Frey
+2  A: 

I recommend you subscribe to the following event: System.Net.NetworkInformation.NetworkChange.NetworkAvailabilityChanged

To address when network available is offline, design your event handler to gracefully reset the receiver. Then conversely for when network availability is online, rebind your receiver.