tags:

views:

1213

answers:

5

IS it possible to multiplex socket connection?

I need to establish multiple connections to yahoo messenger and i am looking for a way to do this efficiently without having to hold a socket open for each client connection.

so far i have to use one socket for each client and this does not scale well above 50,000 connections.

oh, my solution is for a TELCO, so i need to at least hit 250,000 to 500,000 connections

i'm planing to bind multiple IP addresses to a single NIC to beat the 65k port restriction per IP address.

Please i would any help, insight i can get.

**most of my other questions on this site have gone un-answered :) **

Thanks

A: 

You can only multiplex multiple connections over a single socket if the other end supports such an operation. In other words it's a function protocol - sockets don't have any native support for it.

I doubt yahoo messenger protocol has any support for it.

An alternative (to multiple IPs on a single NIC) is to design your own multiplexing protocol and have satellite servers that convert from the multiplex protocol to the yahoo protocol.

Douglas Leeder
Hi Douglas, i don't understand your suggestion. My aim is to increase the possible number of clients i can handle on a single server node to a maximum, so that when i scale out i know that each box has reached it full capacity of clients. how will satellite servers help over multiple IP's on a NIC?
CharlesO
+1  A: 

While you can listen on a socket for multiple incoming connection requests, when the connection is established, it connects a unique port on the server to a unique port on the client. In order to multiplex a connection, you need to control both ends of the pipe and have a protocol that allows you to switch contexts from one virtual connection to another or use a stateless protocol that doesn't care about the client's identity. In the former case you'd need to implement it in the application layer so that you could reuse existing connections. In the latter case you could get by using a proxy that keeps track of which server response goes to which client. Since you're connecting to Yahoo Messenger, I don't think you'll be able to do this since it requires an authenticated connection and it assumes that each connection corresponds to a single user.

tvanfosson
Hi, thanks i understand this. but i'm still stuck looking for a scalable way to get this done. i wonder how the guys at meebo pulled this off :)
CharlesO
+1  A: 

This is an interesting question about scaling in a serious situation.

You are essentially asking, "How do I establish N connections to an internet service, where N is >= 250,000".

The only way to do this effectively and efficiently is to cluster. You cannot do this on a single host, so you will need to be able to fragment and partition your client base into a number of different servers, so that each is only handling a subset.

The idea would be for a single server to hold open as few connections as possible (spreading out the connectivity evenly) while holding enough connections to make whatever service you're hosting viable by keeping inter-server communication to a minimum level. This will mean that any two connections that are related (such as two accounts that talk to each other a lot) will have to be on the same host.

You will need servers and network infrastructure that can handle this. You will need a subnet of ip addresses, each server will have to have stateless communication with the internet (i.e. your router will not be doing any NAT in order to not have to track 250,000+ connections).

You will have to talk to AOL. There is no way that AOL will be able to handle this level of connectivity without considering cutting your connection off. Any service of this scale would have to be negotiated with AOL so both you and they would be able to handle the connectivity.

There are i/o multiplexing technologies that you should investigate. Kqueue and epoll come to mind.

In order to write this massively concurrent and teleco grade solution, I would recommend investigating erlang. Erlang is designed for situations such as these (multi-server, massively-multi-client, massively-multithreaded telecommunications grade software). It is currently used for running Ericsson telephone exchanges.

Jerub
Hi Jerub, i have a simple load balancing algo i use to distribute the incoming Load from my front server (hosting the database and SMPP stuff) to my back end IM-Gateway boxes. i thus am able to effectively partition and scale-out very quickly. My main issue is getting the most out of a single node.
CharlesO
i don't agree with your second statement. The IM Provider (AOL,YAHOO) ultimately controls 'chat between accounts', my solution simply provides a proxy for users to access their IM accounts. I don't allow communication between accounts just because they are both proxying through my server.
CharlesO
Please Jerub, can you expand on the third suggestion concerning network infrastructure, i don't have deep experience in networking. Thanks
CharlesO
I dont share your view on item 4. Meebo does this succesfully with very large scale. and my volume is just a dorp for the IM providers to handle. All my users have legitimate IM accounts and i am just convenience proxy. if they all logged on individually, IM provider will have same load anyway.
CharlesO
Kqueue and epoll. i will lookup. but, right now i am moving away from .net3.5 xxxxAsync. i am trying Net.Sockets.Socket.Select(). it gives me non blocking reads and is very fast and simple - no messy socketAsyncArgs to deal with. I'll post some results when i'm done testing.
CharlesO
Erlang, F#, c++, dude... i'm just a lowly (under paid) vb.net programmer...LOL!!! Thanks for the suggestion though, but i believe i can pull this off with the right approach, and without going to exotic. Please is there an Erlang.net derivative? I might consider that.Thanks Jerub, i appreciate.
CharlesO
+1  A: 

I'll trow in another approach you could consider (depending on how desperate you are).

Note that operating system TCP/IP implementations need to be general purpose, but you are only interested in a very specific use-case. So it might make sense to implement a cut-down version of TCP/IP (which only handles your use-case, but does that very well) in your application code.

For example, if you are using Linux, you could route a couple of IP addresses to a tun interface and have your application handle the IP packets for that tun interface. That way you can implement TCP/IP (optimised for your use-case) entirely in your application and avoid any operating system restriction on the number of open connections.

Of course, it's quite a bit of work doing the TCP/IP yourself, but it really depends on how desperate you are - i.e. how much hardware can you afford to throw at the problem.

cmeerw
cmeerw, thanks i have considered using the raw option, but understand that YMSG protocl is implemented on top off TCP, so i really will be doing an over kill going the way you suggest. What i have done is built a vey light weight parser that converts between bytes[] and YPacket structures.
CharlesO
But it still does not solve my problem of scaling to meet the number of expected client connections this solution will handle.
CharlesO
again, remember that my solution must communicate with yahoo and other IM servers who are all TCP based and who's own IM clients all use standard tcp Scockets.
CharlesO
The best i can hope for is to create an ultra thin client (which i am doing now) of which i can run multiple instances in code without eating up all my server resources, or instantiate as many as possible on a single server node, before eventually having to throw more hardware at the problem.
CharlesO
The saleability issues are not just getting enough ports, which i believe i can address with multiple IP addresses and Multiple NICs (up to a point...but also memory consumption...even when clients are idle, and i cannot disconnect them save resources because reconnection is too expensive.
CharlesO
A: 

500,000 arbitrary yahoo messenger connections - is your telco doing this on behalf of Yahoo? It seems like whatever solution has been in place for many years now should be scalable with the help of Moore's Law - and as far as I know all the IM clients have been pretty effective for a long time, and there's no pressing increase in demand that I can think of.

Why isn't this a reasonable problem to address with hardware plus traditional solutions?

le dorfier
thanks for your response. Im bulding a yahoo IM proxy for mobile the telcos mobile users who dont have data/ internet access on their phone (alot of rural areas in third world where i live). Moore's law sadly didnt scale everything...they didn't remove the 64k port limit on a single IP in TCP :)
CharlesO