views:

592

answers:

3

We've deployed our rails app to EC2. In our setup, we have two proxies on small instances behind round-robin DNS. These run nginx load balancers for a dynamically growing and shrinking farm of web servers. Each web server also runs nginx with a cluster of mongrels. The nginx here takes care of static content and load balancing the mongrels.

Anyway, our traffic by-and-large is HTTPS. We have the 2 proxies taking care of SSL. I've noticed that our network throughput on those instances caps out at only 60 Mbps or so. To contrast, in testing I am able consistently to get 700+ Mbps on a small instance via regular HTTP. In fact, this is the same as what I can get on a large instance. Similar to what the Right Scale guys got in their testing. (Amazon says a small gets "moderate" network I/O, while a large gets "high". If I had to speculate, I think this is just their way of saying that there are more small instances per physical box sharing one network card. I'm not sure if it means that a large gets a dedicated network interface, but I would doubt it.)

In testing, I was able to get a large instance to get about 250 Mbps SSL. This says to me that the CPU or some other resource is the bottleneck. However, our monitoring graphs don't show the CPU on our proxies being particularly busy.

My questions are:

  1. Is my instinct about SSL being slower due to CPU correct and our monitoring graphs are wrong? Or could some other resource be the limiting factor?
  2. Should we just take the extra cost and put the proxies on high-CPU instances? Or would it be better to do just add more small instances?
  3. Should we offload the SSL termination to the web servers? This introduces one more problem, though: how do we get the client IP address in our application? Right now our proxy sets it in the X-FORWARDED-FOR header, but obviously this wouldn't be possible if it's not decrypting SSL.

I'd love to hear about any similar setups. We tinkered a bit with their Elastic Load Balancer, but I think that basically puts us in the same situation as #3 above. Has anyone else made the switch to ELB and found it to be worth it?

A: 

SSL being slower:- true, then any normal HTTP request the HTTPSSL, will be slower.

Try creating a similar setup, at local LAN, where you have 3 mongrel_clust, and 2 webserver. and check using the curl loader, with sending about 5k requests .

if everything is fine, thats great. may be you will working harder with EC2 guys.

T.Raghavendra
A: 

I'm using SSL on Apache, which handles access to our Subversion repository on a Small Windows EC2 instance. In testing, I found that HTTPS access was fractionally slower than HTTP, but that's for the obvious reason that encryption/decryption is not an instantaneous process, as you'd expect.

If your CPU metrics are correct and you're not seeing excessive load, then the implication is that bandwidth is the limiting factor; however, I really can't see why you'd be able to get 700+ Mbps on an HTTP instance, compared to only 60Mbps on an HTTPS instance. Unless the test conditions were not actually identical, of course, and there's something else going on inside the HTTPS instance you haven't factored-in...

The larger instances do of course get a better share of the host bandwidth than Smalls - there are fewer of them competing for the resource. Since the internal EC2 network is Gigabit Ethernet, seeing 700Mbps on a Large instance is feasible assuming no other Large instances on the same node were making similar bandwidth demands. To get that out of a Small instance, you'd have to be really fortunate to be running inside a very lightly-loaded host. And in that case, there'd be no guarantee that you'd keep that performance level - as soon as other Smalls came online, your share of the available bandwidth is going to start dropping.

I think this is essentially a Small instance bandwidth issue - adding more Smalls won't necessarily help much, because you have no control over what host they spin-up on; Large instances, however, get a bigger slice of the bandwidth pie and therefore are likely to have more consistent capacity availability.

Jonners
+2  A: 

Are you using the SSL session cache that nginx provides? That can help nginx save on cycles constantly re-working-out the encryption. See http://wiki.nginx.org/NginxHttpSslModule#ssl_session_cache

What monitoring are you using to determine your cpu usage? SSL is typically very CPU intensive.

I would keep the SSL proxies as a designated layer, that way you can scale the cost of negotiating ssl separately from other concerns.

joshsz