tags:

views:

138

answers:

5

This is the problem description We have thousands of devices(approx 4k -5k) through which we have to read data continuously, every 2 min or 30 seconds. Each device has its unique IP. This data would be collected and then stored in database. These devices are at 100's of location around the country. The data would not be read 24X7 but for at least 12 hrs.

There is a web application which would request at some point to show data which is being collected data through these devices. We would know that data from which device is being requested.

This is how we think we can implement in Java

Solution A : In each location , designate one machine which will act as server and would read data from x number of devices. This data will be pushed to central server every 1 hour . On this designated machine , data is pulled and stored locally (flat file or in memory database)

In this case we will have as many servers as number of locations . for eg we might end up having 1500 servers/machine managing which becomes a nightmare.

Solution B:

We have 8-10 central servers and each server reads data from a bunch of machines. The data gets queued up and is picked up in order which it has arrived.

The servers push the data to database.

How does client get the data ?

In solution B, the client gets it form database, assuming the data has been pushed into db and is still not queued up.

What do you think should work better ?

Any alternate design/solution ?

Should we think about programming at server with Unix/Perl. We do not want to use C++ for some other reasons.

A: 

Try Netty.

duffymo
This looks to be pretty heavy for what we are trying to do.
vsingh
+1  A: 

You've not mentioned getting the clients to talk to the servers, rather than vice versa. Is that an option ? You don't mention the volumes of data being transferred either.

The figures you mention don't sound unreasonable for a Java server (with appropriate connection pooling etc.). Try prototyping some solutions just to test the communications and threading/connection pools. And check out frameworks such as Apache Mina.

Brian Agnew
Client will be talking to server and server will pull data from db which has been collected by sockets. Will look into Apache Mina.
vsingh
+3  A: 

The requirement stated in your question do not imply 1000s of concurrent connections, as you can easily build the connection anew every 30 seconds. Assuming a connection can be disposed of within 500 ms, that leaves 5000 / 30 * 0.5 ~= 100 concurrent connections. Any decent OS should be able to handle that many. With such low concurrency, you can even get away with using a single server with each connection worked by a dedicated thread.

Your design should therefore focus on your other requirements. A few ideas:

  • Are the devices firewalled? With solution A you will have outgoing connections from each location, with solution B you will have incoming ones.
  • What kind of reliability do you need? For instance, do you need to record measurements if a location's internet connection is down? That would imply a local server buffering the measurements.
meriton
5000 / 30 * 0.5 ~= 100 concurrent connections. Yes that is the correct number. All these devices are within a VPN. Thanks for asking these questions.
vsingh
+2  A: 

If it's possible i think your clients should be sending off JMS messages or some sort of queue then you process the queue to store in the database. There's ActiveMQ which would work nicely for this. There's also SQS(From amazon) if you like cloud based deployments then your java servers that talk to the master DB could just pull from that.

dstarh
+3  A: 

If you maintain the connections, you should be able to poll each connection in under 20 microseconds per connection. This means you could poll every connection in under 100 ms uning just one non blocking thread. (perhaps the least efficent way to do this)

Using a Selector is a better approach as it gives a Set of the ready connections.

If you create a new connection each time, this is far more expensive but can take 20 milliseconds, (longer depending on the latency of your network). To pool 5000 connections in 30 seconds you would need to keep 3-4 active at any time. (most of the time would be spent establishing and destroying the connection) You can do all this with one thread, but using a small thread pool might be simpler.

Peter Lawrey
Talking about time limits, we need to create a thread, open a connection , read data, close connection and thread completes. It would not make sense to keep thread alive for 30 second duration.
vsingh
You don't need 1000s of threads, but if you had a thread pool with 1000s of tasks (see Executor) then you would keep the Thread alive all the time because you would be polling one device or another all the time. Sometimes the cost of creating and destroying threads is greater than the cost of keeping them. Similarly, if you could keep the connections open you would reduce the overhead of the server and possibly the client considerably.
Peter Lawrey