views:

292

answers:

1

When sending messages from a self hosted WCF service to many clients (about 10 or so), sometimes messages are being delayed significantly longer than I'd expect (several seconds to send to a client on local network). Does anyone have an idea why this would be and how to fix it?


Some background: the application is a stock ticker style service. It receives messages from a 3rd party server and re-publishes them to clients that connect to the service. It's very important that messages are published as quickly as possible, and in most cases the time between receiving a message and publishing it to all clients is less than 50ms (it's so quick it approaches the resolution of DateTime.Now).

Over the past few weeks, we've been monitoring some occasions when messages are delayed by 2 or 3 seconds. A few days ago, we got a big spike and messages were being delayed by 40-60 seconds. Messages are not being dropped as far as I can tell (unless the entire connection is dropped). The delays does not appear to be specific to any one client; it affects all clients (including ones on the local network).

I send messages to the clients by spamming the ThreadPool. As quickly as messages arrive I call BeginInvoke() once per message per client. The theory being that if any one client is slow to receive a message (because it's on dialup and downloading updates or something) that it won't impact other clients. That isn't what I'm observing though; it appears that all clients (including ones on the local network) are impacted by the delay by a similar duration.

The volume of messages I'm dealing with is 100-400 per second. Messages contain a string, a guid, a date and, depending on the message type, 10-30 integers. I've observed them using Wireshark as being less than 1kB each. We have 10-20 clients connected at any one time.

The WCF server is being hosted in a Windows service on a Windows 2003 Web Edition Server. I'm using the NetTCP binding with SSL/TLS encryption enabled and a custom username / password authentication. It has a 4Mbit internet connection, dual core CPU and 1GB ram and is dedicated to this application. The service is set to ConcurrencyMode.Multiple. The service process, even under high load, rarely exceeds 20% CPU usage.

So far, I've tweaked various WCF configuration options such as:

  • serviceBehaviors/serviceThrottling/maxConcurrentSessions (currently 102)
  • serviceBehaviors/serviceThrottling/maxConcurrentCalls (currently 64)
  • bindings/netTcpBinding/binding/maxConnections (currently 100)
  • bindings/netTcpBinding/binding/listenBacklog (currently 100)
  • bindings/netTcpBinding/binding/sendTimeout (currently 45s, although I've tried it as high as 3 minutes)

It appears to me like the messages are being queued inside WCF once some threshold is reached (hence why I've being increasing the throttling limits). But to affect all clients it would need to max out all outgoing connections with one or two slow clients. Does anyone know if this is true of the WCF internals?

I can also improve efficiency by coalescing incoming messages when I send them to the client. However, I suspect there's something underlying going on and coalescing won't fix the problem in the long term.

WCF Config (with company names changed):

<system.serviceModel>

<host>
 <baseAddresses>
  <add baseAddress="net.tcp://localhost:8100/Publisher"/>
 </baseAddresses>
</host>

<endpoint address="ThePublisher"
                              binding="netTcpBinding"
                              bindingConfiguration="Tcp"
                                      contract="Company.Product.Server.Publisher.IPublisher" />

</behavior>

Code used to send messages:

    Private Sub HandleDataBackground(ByVal sender As Object, ByVal e As Timers.ElapsedEventArgs)
            If Me._FeedDataQueue.Count > 0 Then
                ' Dequeue any items received in last 50ms.
                While True
                    Dim dataAndReceivedTime As DataWithReceivedTimeArg
                    SyncLock Me._FeedDataQueue
                        If Me._FeedDataQueue.Count = 0 Then Exit While
                        dataAndReceivedTime = Me._FeedDataQueue.Dequeue()
                    End SyncLock

                    ' Publish data to all clients.
                    Me.SendDataToClients(dataAndReceivedTime)
                End While
            End If
    End Sub

    Private Sub SendDataToClients(ByVal data As DataWithReceivedTimeArg)
            Dim clientsToReceive As IEnumerable(Of ClientInformation)
            SyncLock Me._ClientInformation
                clientsToReceive = Me._ClientInformation.Values.Where(Function(c) Contract.CollectionContains(c.ContractSubscriptions, data.Data.Contract) AndAlso c.IsUsable).ToList()
            End SyncLock

            For Each clientInfo In clientsToReceive
                Dim futureChangeMethod As New InvokeClientCallbackDelegate(Of DataItem)(AddressOf Me.InvokeClientCallback)
                futureChangeMethod.BeginInvoke(clientInfo, data.Data, AddressOf Me.SendDataToClient)
            Next

    End Sub
    Private Sub SendDataToClient(ByVal callback As IFusionIndicatorClientCallback, ByVal data As DataItem)
        ' Send 
        callback.ReceiveData(data)
    End Sub

    Private Sub InvokeClientCallback(Of DataT)(ByVal client As ClientInformation, ByVal data As DataT, ByVal method As InvokeClientCallbackMethodDelegate(Of DataT))
        Try
            ' Send 
            If client.IsUsable Then
                method(client.CallbackObject, data)
                client.LastContact = DateTime.Now
            Else
                ' Make sure the callback channel has been removed.
                SyncLock Me._ClientInformation
                    Me._ClientInformation.Remove(client.SessionId)
                End SyncLock
            End If
        Catch ex As CommunicationException
            ....
        Catch ex As ObjectDisposedException
            ....
        Catch ex As TimeoutException
            ....
        Catch ex As Exception
            ....
        End Try
    End Sub

A sample of one of the message types:

 <DataContract(), KnownType(GetType(DateTimeOffset)), KnownType(GetType(DataItemDepth)), KnownType(GetType(DataItemDepthDetail)), KnownType(GetType(DataItemHistory))> _
 Public MustInherit Class DataItem
  Implements ICloneable

  Protected _Contract As String
  Protected _MessageId As Guid
  Protected _TradeDate As DateTime

  <DataMember()> _
  Public Property Contract() As String
   ...
  End Property

  <DataMember()> _
  Public Property MessageId() As Guid
   ...
  End Property

  <DataMember()> _
  Public Property TradeDate() As DateTime
   ...
  End Property

  Public MustOverride Function Clone() As Object Implements System.ICloneable.Clone
 End Class

 <DataContract()> _
 Public Class DataItemDepth
  Inherits DataItem

  Protected _VolumnPriceDetail As IList(Of DataItemDepthItem)

  <DataMember()> _
  Public Property VolumnPriceDetail() As IList(Of DataItemDepthItem)
   ...
  End Property

  Public Overrides Function Clone() As Object
   ...
  End Function
 End Class


 <DataContract()> _
 Public Class DataItemDepthItem
  Protected _Volume As Int32
  Protected _Price As Int32
  Protected _BidOrAsk As BidOrAsk ' BidOrAsk is an Int32 enum
  Protected _Level As Int32

  <DataMember()> _
  Public Property Volume() As Int32
   ...
  End Property

  <DataMember()> _
  Public Property Price() As Int32
   ...
  End Property

  <DataMember()> _
  Public Property BidOrAsk() As BidOrAsk  ' BidOrAsk is an Int32 enum
   ...
  End Property

  <DataMember()> _
  Public Property Level() As Int32
   ...
  End Property
 End Class
A: 

After a long support request with Microsoft support, we managed to identify the issue.

Calling WCF channel methods using Begin/End Invoke delegate pattern actually turns into synchronous calls, not asynchronous.

The correct way to asynchronously call WCF methods is by any way except async delegates, which may include the thread pool, raw threads or WCF async callbacks.

In the end I used WCF async callbacks (which can be applied to a callback interface, although I couldn't find specific examples of that).

The following link makes this more explicit: http://blogs.msdn.com/drnick/archive/2007/06/12/begininvoke-bugs.aspx