views:

155

answers:

3

For learning purposes I've written a simple TCP proxy in Erlang. It works but I experience an odd performance fall-off when I use ab (Apache Bench) to make many concurrent requests. It's not the performance fall-off per se that makes me wonder but the scale of the fall-off. The backend is nginx as a web server. My proxy sits inbetween ab and nginx.

This is the code of my proxy.

-module(proxy).
-export([start/3]).

start(InPort, OutHost, OutPort) ->
  {ok, Listen} = gen_tcp:listen(InPort, [binary, {packet, 0}, {active, once}]),
  spawn(fun() -> connect(Listen, OutHost, OutPort) end).

connect(Listen, OutHost, OutPort) ->
  {ok, Client} = gen_tcp:accept(Listen),
  spawn(fun() -> connect(Listen, OutHost, OutPort) end),
  {ok, Server} = gen_tcp:connect(OutHost, OutPort, [binary, {packet, 0}, {active, once}]),
  loop(Client, Server).

loop(Client, Server) ->
  receive
    {tcp, Client, Data} ->
      gen_tcp:send(Server, Data),
      inet:setopts(Client, [{active, once}]),
      loop(Client, Server);
    {tcp, Server, Data} ->
      gen_tcp:send(Client, Data),
      inet:setopts(Server, [{active, once}]),
      loop(Client, Server);
    {tcp_closed, _} ->
      ok
  end.

Firing a 64 sequential requests at my proxy I get a very good result.

ab -n 64 127.0.0.1:80/

Concurrency Level:      1
Time taken for tests:   0.097 seconds
Complete requests:      64
Failed requests:        0
Write errors:           0
Total transferred:      23168 bytes
HTML transferred:       9664 bytes
Requests per second:    659.79 [#/sec] (mean)
Time per request:       1.516 [ms] (mean)
Time per request:       1.516 [ms] (mean, across all concurrent requests)
Transfer rate:          233.25 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       1
Processing:     1    1   0.5      1       2
Waiting:        0    1   0.4      1       2
Total:          1    1   0.5      1       2

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      2
  75%      2
  80%      2
  90%      2
  95%      2
  98%      2
  99%      2
 100%      2 (longest request)

It's just a little slower than using Apache Bench directly against nginx.

But firing 64 concurrent requests at the proxy the performance drops crazy

ab -n 64 -c 64 127.0.0.1:80/

Concurrency Level:      64
Time taken for tests:   2.011 seconds
Complete requests:      64
Failed requests:        0
Write errors:           0
Total transferred:      23168 bytes
HTML transferred:       9664 bytes
Requests per second:    31.82 [#/sec] (mean)
Time per request:       2011.000 [ms] (mean)
Time per request:       31.422 [ms] (mean, across all concurrent requests)
Transfer rate:          11.25 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   31 121.7      0     501
Processing:     3 1135 714.4   1001    2006
Waiting:        3 1134 714.3   1000    2005
Total:          3 1167 707.8   1001    2006

Percentage of the requests served within a certain time (ms)
  50%   1001
  66%   1502
  75%   2003
  80%   2004
  90%   2005
  95%   2005
  98%   2005
  99%   2006
 100%   2006 (longest request)

What/where is the problem? I expected a lower performance but why this much? Look at the requests per second!

It doens't seem to matter much wether I give erl a lot of threads using +A. I even tried SMP but the results are almost the same.

My set up: Windows 7 64, Intel QuadCore, 8GB RAM. I get similar results on Ubuntu using 128 concurrent requests.

EDIT: Included new insight. The total count of requests doesn't matter. It's just the count of concurrent requests.

+1  A: 

have you tried the same tests directly to nginx? if not configured correctly it can also exhibit a performance drop like that.

Javier
Yes I did. Using ab directly to nginx with 16 concurrent requests I get awesome results. 100% of the requests are served in under 5ms.
Fair Dinkum Thinkum
+1  A: 

I'm unable to replicate your results. I tried your tests using apache, yaws, and nginx as the webservers and found very little variation run with and without the proxy with any of them. I did run them on Linux so maybe it's a problem with Windows or the Windows version of the erlang VM.

klm
You are right. It performs much better on Linux (Ubuntu). Even though it runs in VirtualBox it performs better than on Windows :) BUT when I make 128 concurrent requests the problem shows on Linux too. It's odd ... 900 of 1000 calls are served fast, the last 100 are serverd extremly slow (> 3000 ms). I assume a memory/mailbox problem ... maybe the garbage collector isn't fast enough to clean up.
Fair Dinkum Thinkum
+2  A: 

This part of connect/3 is serial:

connect(Listen, OutHost, OutPort) ->
  {ok, Client} = gen_tcp:accept(Listen),
  spawn(fun() -> connect(Listen, OutHost, OutPort) end),

You can't accept new connection until new spawned process doing gen_tcp:accept/1 is ready. It can involve bottleneck to your code. You can try pool of "acceptors" to improve performance in this case. I would also try add catch all clause to loop/2 receive to avoid incidentally mailbox stuffing.

And also what is your erl parameters? Are +A threads and +K true involved?

Hynek -Pichi- Vychodil
I'm not sure how to implement a pool of acceptors. Naively I spawned 4 connect processes from the start function (spawn(fun() -> connect(Listen, OutHost, OutPort) end)) No effect. I added the catch all clause. No effect. I tried +A 4. No effect. +K isn't supported on my Windows. I'll try it on my Linux VM.
Fair Dinkum Thinkum
I don't know Windows, I'm not using it even on desktop for ten years. Anyway I think that number after +A should be little bit bigger. It is number of threads doing async. IO. So I would try 50, 100. Anyway, how big amount of requests you are serving per second? You show only latencies, but not throughput. You can be at high values and you may be not able get more in your circumstances.
Hynek -Pichi- Vychodil
I'm on a MacPro now but a high load is a problem here too. It serves a 1000 requests per second. I had a quick test wih +A 100 and it seems it improved the latency.
Fair Dinkum Thinkum