views:

480

answers:

9

I have this situation.... Client-initiated SOAP 1.1 communication between one server and let's say, tens of thousands of clients. Clients are external, coming in through our firewall, authenticated by certificate, https, etc.. They can be anywhere, and usually have their own firewalls, NAT routers, etc... They're truely external, not just remote corporate offices. They could be in a corporate/campus network, DSL/Cable, even Dialup.

Client uses Delphi (2005 + SOAP fixes from 2007), and the server is C#, but from an architecture/design standpoint, that shouldn't matter.

Currently, clients push new data to the server and pull new data from the server on 15-minute polling loop. The server currently does not push data - the client hits the "messagecount" method, to see if there is new data to pull. If 0, it sleeps for another 15 min and checks again.

We're trying to get that down to 7 seconds.

If this were an internal app, with one or just a few dozen clients, we'd write a cilent "listener" soap service, and would push data to it. But since they're external, sit behind their own firewalls, and sometimes private networks behind NAT routers, this is not practical.

So we're left with polling on a much quicker loop. 10K clients, each checking their messagecount every 10 seconds, is going to be 1000/sec messages that will mostly just waste bandwidth, server, firewall, and authenticator resources.

So I'm trying to design something better than what would amount to a self-inflicted DoS attack.

I don't think it's practical to have the server send soap messages to the client (push) as this would require too much configuration at the client end. But I think there are alternatives that I don't know about. Such as:

1) Is there a way for the client to make a request for GetMessageCount() via Soap 1.1, and get the response, and then perhaps, "stay on the line" for perhaps 5-10 minutes to get additional responses in case new data arrives? i.e the server says "0", then a minute later in response to some SQL trigger (the server is C# on Sql Server, btw), knows that this client is still "on the line" and sends the updated message count of "5"?

2) Is there some other protocol that we could use to "ping" the client, using information gathered from their last GetMessageCount() request?

3) I don't even know. I guess I'm looking for some magic protocol where the client can send a GetMessageCount() request, which would include info for "oh by the way, in case the answer changes in the next hour, ping me at this address...".

Also, I'm assuming that any of these "keep the line open" schemes would seriously impact the server sizing, as it would need to keep many thousands of connections open, simultaneously. That would likely impact the firewalls too, I think.

Is there anything out there like that? Or am I pretty much stuck with polling?

TIA,
Chris

UPDATE 4/30/2010:
Having demonstrated that having 7-second notification is neither easy nor cheap, especially without going outside of the corporate standard of HTTPS/SOAP/Firewalls, we're probably going to pitch a two-phase solution. Phase1 will have the clients poll "on-demand" with the GetMessageCount being performed through SOAP, nothing fancy here. There will be a "refresh" button to pull new data (which is reasonable here, as the user will usually have reason to suspect that new data is ready, i.e. they just changed the fabric color in the online system, so they know to click REFRESH before viewing the shipping manifest on the desktop, and now they see the color in the description.) (This is NOT really a garment/fashion app, but you get the idea). The notion of having the two aps always be in sync, with real-time updates pushed from the host, is still on the table, using the technologies discussed here. But I expect that it will be pushed off for another release, as we can deliver 85% of the functionality without having to do this. However, I hope that we get to do a Proof Of Concept, and can demonstrate that it'll work. I'll come back and post future updates. Thanks for everyone's help on this.

+3  A: 

I would have a look at kbmMW

I would possibly use a method similar to MS Exchange - connection and authentication via tcp/ip, then notification of update(s) from the server to client via udp, then the client recieves the udp request and downloads the update via tcp/ip.

(At least that's how I understand MS Exchange works)

MarkRobinson
Nice - I'll look into it. I like the part about: "using Publish/Subscribe based messaging via multiple types of fully developer accessible and configurable message queues."
Chris Thornton
Your udp notification sounds just like what I'm after. Do you know if that will be able to go back through the same firewall port? I'm hoping for zero configuration here (zero on top of what they already need for SOAP over https).
Chris Thornton
I have great love for kbmMW - it's a bit of a learning curve, but then it kind of just makes sense - free basic version too!
MarkRobinson
I think because the client opens the connection and doesn't disconnect, there should be a udp route back.
MarkRobinson
JUst to add something possibly useful, the last time I used kbmMW, the customer had a very very poor link and also had to run voip too - when the client first connected, I set the refresh time to be 30 seconds plus a random 1-15 seconds - I suppose you could set each client to have a refresh rate of between 3 and 7 seconds (changing every connect) to hopefully ensure that each client doesn't connect at the same time.
MarkRobinson
Using UDP in addition to TCP may require an additional firewall configuration.
Kevin Panko
+2  A: 

The two big parties on multi-tier development in Delphi are components4developers (with their kbmMW product described in the answer by Mark Robinson) and RemObjects with their product RemObjects SDK (they have a nice example that might be similar to what you want: Push notifications for iPhone).

In your complex environment, multi-cast UDP might not cut it, but from a overhead perspective it is unbeatable.

If a connection is open, it can be used in a bi-directional way (this is also used by .NET remoting and WCF), but has additional overhead.

You will need to find a balance between keeping connections live (locking resources), and creating new connections (costing time and latency).

--jeroen

Jeroen Pluimers
Thanks - we actually have a license for RemObjects here, it was bought for a project that didn't get off the ground. I'll dig it up and check out their example.
Chris Thornton
@Chris: If you solve your problem, let us know how you solved it!
Jeroen Pluimers
+Accepted as this will likely be our solution. I wish I could split the bounty across this and several others recommending RemObjects and kbmMW. Thanks All!
Chris Thornton
@Chris: I wasn't ware of the bounty. Thanks. If someone knows a way for me to give part of the bounty to others, let me know: I don't mind sharing it.
Jeroen Pluimers
+1  A: 

The push notifications for iPhone only works if your remote devices are iPhones. The only other options are keep a connection open (although mostly idle) or poll from the client.

You might be able to reduce the overhead of the polling by simplifying the call. Use a simple web-action to return the highest message number back to the client and have the client perform a simple HTTP GET to receive this number. This reduces the amount of bandwidth, and keeps it simple. If then the client needs to get updated data, a full soap call can be made.

skamradt
+1  A: 

Anytime you have one server and 10,000+ clients and you need to get into updates every few seconds you are going to run into issues. I would get a few more servers and keep the clients connected on a background thread in the client that initially connects and then waits for notices to come in from the server with a built-in keep alive mechanism.

If you are trying to push from the server to a non-currently-connected client, then good luck if you have no control over the clients environments. Sounds to me like you are forced into client-initiated connections.

Darian Miller
+3  A: 

You could try making a call to the server and wait on the server for some time (1 minute?) till you have some updates. This way you don't need a connection back from server to client and you get almost instant results to the client (if you have updates within 1 minute you end the wait call). It is a relative easy and widely(?) used by web apps (like Gmail: it has a background connection like this: if a new email arrives you instantly see it in you inbox!). I use something like this (RemObjects):

function TLoggingViewService.GetMessageOrWait: TLogMessageArray;
begin
  if (Session.Outputbuffer.Count > 0) or
     //or wait till new message (max 3s)
     Session.Outputbuffer.WaitForNewObject(3 * 1000)
  then
    //get all messages from list (without wait)
    Result := PeekMessage;
end;

Negative point is: you keep the connection open for a relative long time (what if connection lost due to wifi etc?) and high server "load" (each connection has a thread, is kept open: if you have many clients you can get out of resources).

We use RemObjects here, and use TCP + Binmessage which has a MUCH MUCH lower overhead than SOAP + HTTP and is really fast! So if you can use that, I can really recommend that! (in your case you need Remobjects for Delphi and RemObjects for .Net). Only use SOAP if you need to connect 3rd parties and only use HTTP if you need it due to internet/firewall. SOAP is nice but has high overhead and performance issues.

You can also use a combination of these: a simple (RemObjects) TCP connection (with low overhead) in a background thread, polling each 10s and wait for 5s for new data.

André
+4  A: 

Consider "playing" the HTTP protocol a bit to get what you want while still being able to go over all of the proxies and NAT's and firewalls one might have on the client side.

Have every single client do a plain HTTP request for the message count in a way that would inhibit any sort of caching (example: GET http://yourserver.org/getcount/nodeid/timeofday/sequence). In the server-side implementation of the HTTP server delay providing the answer if the "count" is the same it used to be (ie: no new messages).

I've done this for a Ajax-style application that ran in a browser and behaved a bit like a chat application, but your solution can be even faster. I implemented the server side stuff using the TIdHttp server and that allowed me to actually delay providing the answer to the client stuff by simply Sleep()-ing in it's thread. From the client side it looked like an server that's sometimes really slow to give an answer.

Pseudocode for the server-side stuff:

function ClientHasMessages(ClientID:Integer; MaxWait:TDateTime):Boolean;
var MaxTime:TDateTime;
begin
  if ClientActuallyHasMessage(ClientID) then Result := True
  else
    begin
      MaxTime := Now + MaxWait;
      while Now < MaxTime do
      begin
        if ClientActuallyHasMessage(ClientID) then
          begin
            Result := True;
            Exit;
          end
        else
          Sleep(1000);
      end;
      Result := False; // TimeOut
    end;
end;

The idea behind this code: It runs in a thread on your own server, where it can test the message count, presumably, for very little cost:

  • It causes no network traffic while waiting.
  • It uses no CPU while Sleeping.
  • It will let the user know about it's message very quickly.
  • It lets the client control how long the wait might be (the client will increase the amount of time the server may delay the answer until it no longer receives the answer, and then step back a bit - that way the protocol adapts to whatever buggy NAT router the client uses).
  • You can get away with long periods of no TCP/IP communications and still being able to provide the answer instantly. 30 seconds is easily done and for clients with good NAT routers it can be much longer.

The down size of this would be the requirements on the server, but I'm tempted to say they're doable:

  • The server's TCP/IP implementation needs to track quite an number of simultaneous connections (every client will have a HTTP request active at all times). My Linux NAT machine is tracking 15K connections right now and it's basically idle, so it might work.
  • The server would have an thread open for every single client HTTP request, at all times: Again, the Server 2008 "Workstation" I'm using to write this (thank you MSDN for allowing me to do such outrageous things) has about 1500 threads active and it's also basically idle...
  • Depending on the technology you use for the server-side code MEMORY might be the limiting factor.
Cosmin Prund
+2  A: 

I've done performance testing on systems even larger than your 10K clients, and when you reach the mentioned amounts of requests/sec you will most likely face issues with Connections/sec, Concurrent open connections, firewalls becoming slow etc. (Much the same issues a Torrent Tracker might face).

If the clients only need to "ask if there is anything new" the lightest protocol that is easy to implement is UDP, the next lightest would be pure TCP, both using Indy clients.

The protocol itself could actually be as simple as sending "Anything new since [yyyy-mm-dd hh:mm:ss]" to the server, and it replying with a 1 byte number (256 answers possible).

With TCP you'd have the added benefit of keeping the "pipe" open for a few minutes, and you can send the "anything new" message every x seconds. Also with TCP the server could "push" info to the pipe (client(s)) when something happens, given that the client is checking for data in the pipe periodically.

K.Sandell
Since an IP packet has quite some overhead it doesn't really make sense to reply with a single byte. If you cause a packet to be sent, simply send back what's necessary to transfer the information - 4 or 8 bytes instead of 1 for example won't make *that* much difference...
mghie
Quite right, so one could actually just go for the Client connecting then listening for something for a awhile, and disconnecting if nothing's there ... The real trick is to choose good values for staying connected, and when to reconnect again...
K.Sandell
Along these lines, I'm thinking that maybe we should set up a second "messagecount" server, outside the firewall. The main server would push message counts to it when they change. Then the clients would connect to the MessageCount server using one of these other methods. Even if we resorted to http get calls, they would not be bothering our internal firewalls and authenticators, as presumably, the messagecount could be represented in such a way that it would not need to be secured. Interesting!
Chris Thornton
+1  A: 

I would try to distribute the load as much as possible between several servers. For that, I'd do the following:

  1. Clients register with your service in order to get notifications. They get a session ID that is valid for a given amount of time (15 minutes).
  2. Servers will run a periodic check about which registered client has incoming message and generate a list of such clients (technically, I'd push that into a different DB altogether in your DMZ).
  3. You run a number of "push notification" servers which follow a very, very simple protocol: they get a URL query containing the session ID and responds with a short HTTP response: either a 404 or a 200 with a (signed) URL of the SOAP server to address in order to grab the messages. For additional performances, you can use HTTP 1.1 and persistent connections.
  4. Client will pool these push notification servers as often as they want. Since they are very simple and are strictly read-only, they can answer queries very fast and will be easy to scale out.
  5. If the client receives a 302 response, it can the connect to the correct SOAP server (you can also use that for load distribution if you want) and pull the messages.

You'll have to be careful about security here. First, I suggest that you do NOT use HTTPS for your push notification servers. Instead, you can sign the content of the response with a session key exchanged when the client requested notifications. The client is then responsible to validate the answer. Don't forget that you need to sign not only the status but also the SOAP service URL.

It's a bit complex but by decoupling the status and actual message traffic, you can scale your solution out much more easily. Also, you will not need to go through the expensive SSL negotiation until you actually want to exchange data.

Stephane
+1  A: 

We use RemObjects SDK "events" for this, but this may not be suitable for you because

a: It only works with RemObjects own binary protocol, not SOAP (ie the clients must incorporate the RO code)

b: Basically it's a "keep the line open" approach. so scalability to 10K clients is a potential issue.

I'd try some tests just to see what overhead keeping 10K sockets open actually has. If all you need is a couple of gigs of extra server memory, that's going to be a cheap fix. And because the socket is opened from the client end, it shouldn't cause firewall issues. The worst the firewall can do is to close the socket, so your client would need to reopen it when that happens.

Roddy
Thanks, this is the approach that I'm leaning towards, but with a twist: I'm thinking of having a separate server just for the notifications. Since the notifications don't contain the actual data payload (that would stay over in SOAP, with https, certificate authentication through the firewall, etc..) and only have a client id and message count, security isn't as much of a concern. So we could put this in the DMZ or even use external hosting. cont...
Chris Thornton
... So now it becomes a question of "how many servers do you need to handle 10K RemObjects notification pipes", instead of "how much $$$ will it cost to support 10K SOAP reqests, via https, with client cert authentication, over our ESB, all the way to our backend Sql*Server database?"
Chris Thornton
@Roddy, can I ask how many clients you DO keep open simultaneously? And is there a sample project that you'd recommend I look at, using the RO SDK? I'm hoping to get some time to prototype it in a few weeks (buried with production stuff right now).
Chris Thornton