views:

1091

answers:

10

I am in the process of building a Web Service API for my application. Also, I am planning to expose the Service via both REST and SOAP.

I'm interested in getting some feedback from the community as to which programming language I should choose to implement the service? (I know C#, Java and Ruby - RoR well enough to create the service).

The service is mainly an HTTP POST service. It will need to handle around 2000 concurrent connections as well as be able to handle around 10,000 HTTP POST's per second. (for SOAP we will have a submit method for the clients to call).

The service does not return any response to the client for the POST requests.

Any ideas on which programming language/architecture which should be used?

+36  A: 

10,000 requests per second is 25 billion hits per month. That means one of two things:

  1. Your application is more popular than MySpace; or
  2. You are trying to use this to communicate between two very chatty components that you control, and it's a poor horrible design choice.

The switching hardware alone to distribute that much load across a farm of web front-ends would cost many thousands of dollars.

Start by writing a web service that can handle 50 requests per second (language choice is not hugely relevant). If your application is so busy that you cross that threshold regularly, you can afford to hire someone to work on the scaling problem full-time, and not have to ask for help on a free Q&A site.

Rex M
Chuck Norris couldn't even handle 10,000 request per second.
Chris Ballance
Jon Skeet could.
bigmattyh
Chuck norris doesn't take requests; he gives orders.
SnOrfus
Chuck Norris doesn't take requests; he rips them from you.
Ólafur Waage
"Chuck norris doesn't take requests; he gives orders." Oh my that was funny.
Cj Anderson
While I agree that this is the right answer, I would disagree with the analysis that 10,000 reqs per sec = 25 billion hits per month. 10,000 reqs per sec if that were steady state would mean that many hits per month, but presumably the OP is planning for peak not steady state. The answer is good.
EBGreen
@EBGreen true, if 10000 is peak and not average. But 10000r/s peak would be worth a pretty penny, too :)
Rex M
It certainly would. And if you couldn't monetize that much traffic enough to hire devs that could handle the related problems, then you deserve to have your site come to a grinding halr.
EBGreen
Class! Read http://www.kegel.com/c10k.html :) 10000 requests per second on a single commodity machine have been possible for at least 10 years now! We can certainly do better with todays hardware!
pi
@pi that works in a controlled environment when the requests don't represent a unit of work that would require much more substantial hardware to keep up with the mere receiving of the request. In the real world, that is not realistic.
Rex M
I think the point of his question is just to determine which language might be better for such a high-volume service (the answer might be it doesn't matter, and that hardware will be a bigger issue). Either way, no need to get snarky in your answer Rex.
Sam Schutte
+1  A: 

You can really use any language through CGI (Common Gateway Interface), so it comes down to performance. Among the languages you list, i expect C# to be fastest. A good comparison for speed among languages is The Language Shootout

If you really need performance you might want to look in the direction of a more performance oriented language like C or D to handle the requests.

It all depends on what kind of computation each request have to perform really.

Zuu
CGI as a very poor choice as it adds the cost of spawning the process (which can be very high) to the actual processing. I recently fight to port a CGI architecture to an alway in memory one. Early prototypes suggest 10-100x performance increase!
pi
well, you are wasting your time, take a look at FastCGI (which is among what i refor to when simply saying 'CGI').FastCGI is designed with the intend that your CGI application never respawn but stays in memory indefinetly. So no, there will be no overhead on spawning a process. ;-)
Zuu
+10  A: 

At 10,000 posts a second, the language is the least of your worries. A much bigger issue would be the design of your server farm and network. I assume you don't plan on running this on a single box?

Jason
A: 

Update: Its meant to be a fire and forget web service. I guess i will send back a simple HTTP 200/OK response

No this is not intended to run on a single box. Its intended to run on a few boxes (say 3-4).

When the requests are received they are pushed to queue's on other machines, then they are taken and put in a HBase/Voldemort Store.

As i said, its meant to be a "fire and forget" web service

Ray Dookie
You should edit your question to include this as a clarification, or comment directly on an answer. This is not a messageboard.
Rex M
There is no way you can handle that many requests/second with 3-4 machines. Good luck!
Ryan Doherty
"3-4 boxes" is still 2500-3300 requests per second on a single machine. That is just not realistic.
Rex M
@Rex M: 10000 requests per second on a single machine was possible in 1999. Take a look at: http://www.kegel.com/c10k.html
pi
+6  A: 

Highly scalable applications, reliable, distributed, and using multicore/multiprocessor systems? Here I immediately think of Erlang/OTP together with Yaws as the web application server. Yaws runs extreme stable and fast under extreme high load. And Erlang/OTP as the platform is designed for concurrency and distribution, together with some mechanisms helping to develope stable software. The costs: concurrency-orientation with a functional programming language is no OOP with Java or C#, the syntax seems weird (but is very straight and powerful once you've adopted it), and the number of third-party libraries is not as huge as for the mainstream languages. But it's worth it.

Hope this helps

mue

Mue
+2  A: 

At that rate, and since you're breaking HTTP anyway (no response) you might as well develop your own server, or modify an open source server.

Write it all in C or C++ and you'll be blazing about as fast as possible.

Scalability is affected by more than language choice though.

Adam Davis
May be ok. Though use HTTP! Great mature protocol and great mature implementations available. Great balancing solutions available.
pi
A: 

You need a C llike language and to avoid writing a complete server I would suggest CGI (which is what php and the like all run through anyway) Windows servers offer ISAPI plug ins, but these run in the context of the server so memory leaks and GPFs will take down the server. Add to that the inconvenience of stopping/starting the server each time you change somthing, CGI/FastCGI looks better.

Mike Trader
+2  A: 

I could see getting a billion posts per month out of a single machine. I have a web service written in c# that's currently handling about 3.5 Million posts per day. The web server is running along at 3% CPU utilization. Which means I could push it at least 20 times as hard...

Assuming each of your machines had 4 Xeon Six cores, 32GB of RAM, a fast disk array, and a highly optimized database for writes you could do it. Although, the cost of each server is probably in the $35K to $40K range.

Regardless, your bottleneck would not be with C# or Java. It would be with the database server depending on how large it grows. In my case, it's about 300GB with 10GB being deleted and 10GB being added per day.

Chris Lively
+2  A: 

Based on my previous experience I can give you the following advice.

  1. Pick the language you (and possibly other team members) like most. I would prefer higher level languages because hardware is fast and cheap, but programmers are slow and expensive.
  2. Design your services to be absolutely state-free (no sessions!). This makes it easy to add new hardware, as the different instances of your service need not know of each other.
  3. Handle your processing asynchronously, as you fortunately need not give the client any response (other than OK). If you do it synchronously your process will block and your request-rate will drop. A good read is this Wikipedia article, and especially (the classic!) The C10K problem.
  4. Put the service on many machines. (depending on the speed of your services)
  5. Put your database server(s) on other machines than the web-services. Use fast disks!
  6. Handle the load by balancing it with something like:
    • Linux Virtual Server, the most performant solution, because it runs in the Kernel. Scales like mad. I used it 2003 with ~500req/sec on a P3/1GHz with 0.1% CPU load. Can be paired to achieve HA. Should handle the 10000req/sec quite nicely on a single machine. Do this after trying something simpler. This can be quite challenging.
    • Pound, very easy to set up but high overhead. Good starting point. Can do SSL.
    • Nginx, easy configuration, very performant. Can do SSL. Can also act as HTTP-Server and may be a performant hosting solution for your services.
    • Perlbal, haven't used it but heard good things.
    • or other reverse proxys.
pi
+1  A: 

Lets look at the issues:

IO: this will easily be the greatest bottleneck in your system. Pick a language that provides the best integration with host os, and, provides advanced semantics for non blocking and optionally support for concurrency.

Data: SOAP? XML? You will want to minimize any un-necessary cpu cycles. What's wrong with simply using JSon? (And there is no divine dictate that says a REST architecture based server can't use binary data in the protocol ...)

Content: If any transformation of data (from to text to number, for example) is involved, you will also need to consider which language provides the most efficient mechanisms. As an example, in Java (which is a very strong candidate for you, btw) the String class is a serious CPU hog.

Java and Erlang are very good candidates. C is always an option but concurrent programming is much more difficult.