We recently went into private beta on our flagship product and had a small launch event. Unfortunately the venue had a terrible wireless connection and packets were being dropped left right and centre causing havoc with out system, basically it wasn't able to work at all! Luckily we were able to switch to a different network and rescue the demo. This highlighted something that I knew was already an issue but hadn't appreciated quite how much of an issue it could be. Our system relies heavily on BOSH and has a rather large JavaScript code base which now works rather well under good network conditions. However we need to make it work well under bad network conditions as well.
Due to the way that XMPP works, a fire and forget system, it's not easy to tell if a message you sent, or were supposed to receive, was actually sent or received. For instance, we have an offer system, one user will send an offer to another over BOSH. When this message is received by the server a message is published to the offering users offers_sent PEP node and a similar message to the receiving users offers_received PEP node. While the sending user is able to tell if their offer was send (relatively) easily, if the notification to the receiving user is never received that user will never know it missed a message.
A little about out JavaScript setup, it has 4 main layers:
- StropheJS
- An MVC framework for dealing with low level tasks and to build on top of
- An application layer which contains the app logic routes, controllers models etc. as well as a browser cache of the model data
- A UI layer that receives events and publishes events to and from the application layer
One way to solve the missing messages issue would be to periodically check the PEP nodes for new data that the browser doesn't know about. If a new message was discovered the browsers cache would be invalidated and all new data would be requested from the server. I'm not sure this is the best way to go and it also doesn't cover all situations. We certainly don't want to get into the situation where we are sending messages to confirm the previous message was received at it's destination as this would double the network traffic.
With the number of real time websites growing daily this is an issue that must have been encountered by other developers, it would be interesting to see how it's been solved by others. As far as I can see there are two situations in which messages go missing:
- On poor connections messages are not sent or received due to the packets being dropped
- Involving navigating between pages, a message is received by the browser but is not fully processed and stored in the local cache before the page is unloaded. Or a message is added to the send queue but never sent before the page is unloaded
I suspect the hardest issue to solve will be number 2. Any thoughts on the subject would be much appreciated.