views:

56

answers:

1

We recently went into private beta on our flagship product and had a small launch event. Unfortunately the venue had a terrible wireless connection and packets were being dropped left right and centre causing havoc with out system, basically it wasn't able to work at all! Luckily we were able to switch to a different network and rescue the demo. This highlighted something that I knew was already an issue but hadn't appreciated quite how much of an issue it could be. Our system relies heavily on BOSH and has a rather large JavaScript code base which now works rather well under good network conditions. However we need to make it work well under bad network conditions as well.

Due to the way that XMPP works, a fire and forget system, it's not easy to tell if a message you sent, or were supposed to receive, was actually sent or received. For instance, we have an offer system, one user will send an offer to another over BOSH. When this message is received by the server a message is published to the offering users offers_sent PEP node and a similar message to the receiving users offers_received PEP node. While the sending user is able to tell if their offer was send (relatively) easily, if the notification to the receiving user is never received that user will never know it missed a message.

A little about out JavaScript setup, it has 4 main layers:

  1. StropheJS
  2. An MVC framework for dealing with low level tasks and to build on top of
  3. An application layer which contains the app logic routes, controllers models etc. as well as a browser cache of the model data
  4. A UI layer that receives events and publishes events to and from the application layer

One way to solve the missing messages issue would be to periodically check the PEP nodes for new data that the browser doesn't know about. If a new message was discovered the browsers cache would be invalidated and all new data would be requested from the server. I'm not sure this is the best way to go and it also doesn't cover all situations. We certainly don't want to get into the situation where we are sending messages to confirm the previous message was received at it's destination as this would double the network traffic.

With the number of real time websites growing daily this is an issue that must have been encountered by other developers, it would be interesting to see how it's been solved by others. As far as I can see there are two situations in which messages go missing:

  1. On poor connections messages are not sent or received due to the packets being dropped
  2. Involving navigating between pages, a message is received by the browser but is not fully processed and stored in the local cache before the page is unloaded. Or a message is added to the send queue but never sent before the page is unloaded

I suspect the hardest issue to solve will be number 2. Any thoughts on the subject would be much appreciated.

A: 

There's no good solution for this, however there is a workable solution.

BOSH sessions only remain valid for a given time (60 seconds, by default, in most implementations). Once the session expires the fake c2s connection is closed and the user has to log in again.

While the session is valid no messages should be lost or arrive out-of-order. The only potential for loss is during the sixty-second window allowed for HTTP to reopen a connection, and, as mentioned, if that window closes then a new session has to be created. If a new HTTP request is made within that window then nothing will be lost or arrive out-of-order.

I would suggest, since you're using PEP as your store, that you have a hook in the client whenever a session gets created you fetch items from your PEP nodes to initialize your client side cache (see section 6.5 of XEP-0060).

Messages can still be lost if they are successfully received by your BOSH client but the web page is closed or reloaded before they can be successfully processed. However, for other conditions you should no longer see any loss of data, only an additional lag during start up due to the item retrieval.

Brian Cully
We have implemented a temporary fix thats made our system far more stable to use. We are requesting all data from PEP and PubSub nodes on every page load. While this is not ideal as it increases traffic it does work.In the future we are going to implement BOSH ACKing (see http://xmpp.org/extensions/xep-0124.html#acks) which will allow us to ensure message delivery. As this requires a body of work on both the client and server side we are going to implement this at a later date when we need this optimisation.
JamieD