views:

90

answers:

4

Hello,

For one of our project, we need to pull huge real time stock data from 4 remote servers across two countries. The trivial process here, check the sources for a regular interval and save the update to database.

But as these are real time stock data of more than 1000 companies, I have to pull every second, which isn't good in case of memory, bandwidth I think.

Please give me suggestion on which technology/platform [We are flexible here. PHP, Python, Java, PERL - anyone of them will be OK for us] we should choose, it can be achieved easily and with better performance.

+1  A: 

It is unlikely that a particular technology will be better over all the others in a significant way, making that a clear choice. You need to design it appropriately. The language you use will be of little consequence.

Also, your question is really unanswerable without having access to a lot more information than which you provided in your question (and which you cannot possibly provide here).

Moron
@Moron, think about changing that name ;-D
pavium
If I do, all the @Moron comments I have been getting will suddenly turn rude :-)
Moron
+1  A: 

If you want huge, real-time data, chances are that the protocol matters much more than the language. However, here are a few aspects you may wish to take into account:

  • HTTP was never meant for huge data or real-time, so if you can use something more appropriate, you're probably better off with another protocol -- from the top of my head and if I recall correctly, FTP is one example of a more bandwidth-friendly protocol than HTTP, although it's certainly not the best one
  • given the setting of permanent pulling, you're probably better off in a language which contains primitives for asynchronous I/O and is robust with respect to threads.

I'd personally go for Erlang: built-in high-speed protocols for distributing data, asynchronous everything and probably the best implementation of concurrency and distribution this side of academia.

If you are limited to the list of languages you provided, I'd go for Java. The I/O is a tad complicated but rather powerful, the library contains so many objects that it's bound to have what you need somewhere, it allows asynchronous I/O and it manages threading quite decently.

That said, I'd concentrate more on the protocol than the language. No matter what language you use, there's bound to be a library for your protocol.

Yoric
Thank you very much.
Habib Ullah Bahar
+1  A: 

Tossing some names so you can go explore: HTTP, XMPP, AMQP, ZeroMQ. Implementation wise, things coded in Erlang that have cluster support might be a good fit.

Tobu
A: 

As mentioned, protocol is more important then technology. Based off my experience you are likely going to be sending very similar data at quick (possibly non-standard) intervals so you can, with a little bit of effort develop a lightweight data/content schema and send everything as raw binary. Please avoid JSON/XML for your data, it's way too fat and slow for way too little benefit unless you have lots of resources and bandwidth.

Pick a protocol that's going to be fast in doing so (I can't offer much suggestions here, don't do HTTP though).

Aea