I have multiple app processes that each connect to servers and receive data from them. Often the servers being connected to and the data being retrieved overlaps between processes. So there is a lot of unnecessary duplication of the data across the network, more connections than should be necessary (which taxes the servers), and the data ends up getting stored redundantly in memory in the apps.

One solution would be to combine the multiple app processes into a single one -- but for the most part they really are logically distinct, and that could be years of work.

Unfortunately, latency is critically important, and the volume of data is huge (any one datum may not be too big, but once a client makes a request the server will send a rapid stream of updates as the data changes, which can be upwards of 20MB/s, and these all need to be given to the requesting apps with the shortest possible delay).

The solution that comes to mind is to code a local daemon process, that the app processes would request data from. The daemon would check if a connection to the appropriate server already exists, and if not make one. Then it would retrieve the data and using shared memory (due to latency concern, otherwise I'd use sockets) give the data to the requesting app.

A simpler idea in the short term that would only solve the redundant connections would be to use unix domain sockets (this will run on a unix OS, though I prefer to stick to crossplatform libs when I can) to share a socket descriptor between all the processes, so they share a single connection. The issue with this is consuming the buffer -- I want all the processes to see everything coming over the socket, and if I understand right with this approach a read in one process on the socket will prevent other processes from seeing the same data on their next read (the offset within the shared descriptor will be bumped).

+2  A: 

I believe a dedicated service that exposes the data via shared memory is your best bet. Secondary from that would be a service that multicasts the data via named pipes, except that you're targeting a Unix variant and not Windows.

Another option would be UDP multicast, so that the data replication occurs at the hardware or driver level. The only problem is that UDP data delivery is not guaranteed to be in order, nor is it guaranteed to deliver at all.

I think sharing the physical socket is a hack and should be avoided, you would be better off implementing a driver that did what you wanted the daemon to do transparently (e.g. processes saw the socket as a normal socket except internally the socket was mapped to a single socket, where logic existed to re-broadcast the data among the virtual sockets.) Unfortunately the level of effort to get it right would be significant, and if time to complete is a concern sharing the socket isn't really a good route to take (whether done at the driver level, or via some other hacky means such as sharing the socket descriptor cross-process.)

Sharing the socket also assumes that it is a push-only connection, e.g. no traffic negotiation is ocurring at the app level (requests for data, for example, or acknowledgements of data receipt.)

A quick-path to completion may be to look at projects such as BNC and convert the code, or hijack the general idea, to do what you need. Replicating traffic to local sockets shouldn't incur a huge latency, though you would be exercising the NIC (and associated buffers) for all of the data replication and if you are nearing the limit of the hardware (or have a poor driver and/or TCP stack implementation) then you may wind up with a dead server. Where I work we've seen data replication tank a gigabit ether card at the driver level, so it's not unheard of.

Shared Memory is the best bet if you want to remain platform independent, and performant, while not introducing anything that may become unsupportable in 5 years time due to kernel or hardware/driver changes.

Shaun Wilson
+1 I would extend this to a combination of multicast over the network and shared memory for local access.
Nikolai N Fetissov
+2  A: 

I recommend that you take a look at ZeroMQ. This might help solve your problem. I don't think that 20MB/s is very high ... you should be able to achieve that level of throughput by just using the TCP transport in ZeroMQ. There is also support for other transport mechanisms, including reliable multicast using OpenPGM. There are plans to add UNIX pipes as a transport mechanism.

Messaging will probably be safer and easier than shared memory. Notably if you use messaging instead of shared memory then you can split up your application components across a cluster of servers ... which might give you significantly better performance than shared memory, depending on where your bottlenecks are.