views:

31

answers:

1

So I'm working through a bit of a problem, and some advice would be nice. First a little background, please excuse the length.

I am working on a management system that queries network devices via the TL1 protocol. For those unfamiliar with the protocol, the short answer is that is is a "human readable" language that communicates via a text based IO stream.

I am using Spring and Jsch to open a port to the remote NE (network element), login, run the command, then close the connection. There are two kinds of ways to get into the remote NE's, either directly (via the ssh gateway) if the element has a tcp/ip address (many are osi only), or through an ems (management system) of some type using what is called a "northbound interface".

Either way, the procedure is the same.

  • Use Jsch to open a port to the NE or ems.
  • Send login command for the NE ex. "act-user<tid>:<username>:UniqueId::<password>;"
  • Send command ex. "rtrv-alm-all:<tid>:ALL:uniqueid::,,,,;"
  • Retrieve and process results. The results of the above for example might look something like this...

    RTRV-ALM-ALL:foo:ALL:uniqueid;

    CMPSW205 02-01-11 18:33:05

    M uniqueid COMPLD

    "01-01-06:MJ,BOARDOUT-ALM,SA,01-10,12-53-58,,:\"OPA_C__LRX:BOARD EXTRACTED\","

    ;

The ; is important because it signals the end of the response.

  • Lastly logout, and close the port.

With Spring I have been using the ThreadPoolTaskExecutor quite effectively to do this.

Until this issue came up ...

With one particular ems platform (Hitachi) I ran across a roadblock with my approach. This ems handles as many as 80 nodes through it. You connect to the port, then issue a command to login to the ems, then run commands pointing to the various NE's. Same procedure as before, but here is the problem...

After you login into the ems, the next command, no matter what it is, will take up to 10 minutes to complete. until that happens, all other commands are blocked. After this initial wait all other commands work quickly. There appears to be no way to defeat this behaviour (my suspicion is that there is some NE auto-discovery happening during this period).

Now the thrust of my question...

So my next approach for this platform would be to connect to the ems, login to it, and keep the connection open, and just pass commands to the various NE's. That would mean a 10 minute delay after the application (web based) first loads, but would be fine after this point.

The problem I have is how best to do this. Having a single text based iostream for passing this stuff through looks like a large bottleneck, plus multiple users will be using the application, how do I handle multiple commands and responses against this single iostream? I can open a few iostreams (maybe up to 6) on this ems, but that also complicates sorting out what goes where.

Any advice on direction would be appreciated.

A: 

Look at using one process per ems so that communication to each ems is separated. This will at least ensure that communications with other ems's are unaffected by the problems with this one.

You're going to have to build some sort of a command queuing system so that commands sent to the Hitachi ems don't block the user interface until they are completed. Either that, or you're going to have to put a 10 minute delay into the client software before they can begin using it, or a 10 minute delay into the part of the interface that would handle the Hitachi.

Perhaps it would be a good policy to bring up the connection and immediately send some sort of ping or station keeping idle command - something benign that you don't care about the response, or gives no response, but will trigger the 10 minute delay to get it over with. Your users can become familiar with this 10 minute delay and at least start the application up before getting their coffee or something.

If you can somehow isolate the Hitachi from the other ems's in the application's design, this would really ensure that the 10 minute delay only exists while interfacing with the Hitachi. You can connect and issue a dummy command, and put the Hitachi in some sort of "connecting" state where commands cannot be used until the result comes in, and then you change the status to ready so the user can interact with it.


One other approach would be to develop some sort of middleware component - I don't know if you've already done this. If the clients are all web-based, you could run a communications piece on the webserver which takes connections from the clients and pipes them through one piece on the webserver which communicates with all of the ems's. When this piece starts up on the webserver, it can connect to each ems and send some initial ping command which starts the 10 minute timer. Once this is complete, the piece on the webserver could send keepalive messages every so often, again some sort of dummy command, to keep the socket alive so it wouldn't have to reset and go through the 10-minute wait time again. When the user brings up the website, they can communicate with this middleware server piece which would forward the requests to the appropriate ems and forward the response back to the client -- all through the already open connection.

Erick Robertson
To answer some of the points. The application is running under Tomcat, so it would only be when I restart Tomcat that the delay be noticed. There is already a TL1 command that works as kind-of a ping the "rtrv-hdr" command.
Bill
So are you looking for how to actually develop the queuing structure that would handle taking the commands, sending them through the socket, retrieving the results, and sending them back to the appropriate client?
Erick Robertson