I haven't built anything like this nor do I have any obvious articles at hand - but I can tell you the approach I'd take. There's two bits to that:
- Get a broad understanding of the options available.
- Work out what NFR targets you need to hit. For example, what sort of latency is going to be acceptable? How much data are you expecting to move around?
Pattern
The fundemental question here is how to handle the exchange of data; do you take a multiple peer-to-peer approach or a central hub-and-spoke?
Yes, it'd be interesting to see what the real-time multiplayer games do.
Technology
You're after a web-based solution - does that preclude the use of RIA techniology like SilverLight? One thing about multiplayer games is that they aren't built in HTML :)
If you use straight HTML / AJAX / etc you're forced into a hub and spoke architecture - browsers can't call each other, so all comms will need to go via a server. Also browsers don't accept calls - they only display what they've requested.
Using straight HTML possibly makes it easier for the tool to be used by a wide audience - which might be what you want - or it might not matter to you (?) Having said that - drag-n-drop isn't something that's trivial to do in todays multi-browser world; using an embedded technology will remove those issues but will probably reduce your user market.
Other Alternatives
- There's a few interactive chat applications and websites about - they might have an approach that suits you.
- Rather than building an app that does all 'real-time' the work, can you leverage existing web-conferencing systems? So, rather than share application data between application end-clients you could just exchange screen data.
I hope that's of some help.