views:

504

answers:

3

I am implementing a simple TCP server using the select() method - everything is fine and performance is quite acceptable, but when benchmarking with ab (apachebench) the "longest request" is insanely high compared to the average times:

I am using: ab -n 5000 -c 20 http://localhost:8000/

snippet:

Requests per second:    4262.49 [#/sec] (mean)
Time per request:       4.692 [ms] (mean)
Time per request:       0.235 [ms] (mean, across all concurrent requests)

Percentage of the requests served within a certain time (ms)
  50%      2
  66%      2
  75%      2
  80%      2
  90%      2
  95%      3
  98%      3
  99%      4
 100%    203 (longest request)

and the same against apache:

Requests per second:    5452.66 [#/sec] (mean)
Time per request:       1.834 [ms] (mean)
Time per request:       0.183 [ms] (mean, across all concurrent requests)

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      2
  75%      2
  80%      2
  90%      3
  95%      3
  98%      4
  99%      4
 100%      8 (longest request)

For reference, i am using stream_select, and sockets are non-blocking.

Is this a common effect of using the select() call?
Are there any kinds of performance considerations i should worry about?

Update:

When using a concurrency value <= 6, the longest request is "normal" (about 2x or 3x the average), but anything above 6 just gets crazy (for example, 7 concurrent requests may benchmark the same as 20, or around 200ms).

Update2:

After replacing the stream functions with equivalent socket functions, and some proper testing/benchmarking, the issue no longer occurs - so i will attribute this behavior to some obscure detail on the PHP implementation of streams.

A: 

Since 99% of your requests are completing in only 4ms, that would tend to implicate a once-off cost, like a DNS lookup or swapping a large amount of your code in from disk.

caf
Thank you, that makes sense - however: this happens on local host/network too, code is a single php file already running, and benchmarks are repeated several times...
jcinacio
+1  A: 

you could use wireshark or another sniffer to track the tcp-ip traffic. This way you can see if the problem has to do with low-level issues (retransmissions, packetloss, etc)

Toad
I have considered it, but i have NO clue how to "debug" hundreds of requests... any ideas?
jcinacio
only 1 request is the one which takes forever. Make sure every request has a unique ID which is logged client side. Make sure you transmit the ID along somewhere in the tcpip stream. The moment you know which ID is the highest, look through the recorder wireshark tcp-ip logs and search for the same ID. Only look through this request and compare it to ones which were fast
Toad
+1  A: 

200ms sounds like a scheduler time quantum.

Just to be sure, you're using a NULL or nonzero timeout for select? Are you writing to sockets that are only ready for reads, or vice versa? Are you processing every fd that select returns before calling select again? Would be really nice to see some code...

I don't think it would be network if you're testing against localhost. But reinier is right, it looks a lot like what you'd see if there were some TCP retransmit (200ms is the minimum TCP retransmit timeout in a reasonably modern linux).

Keith Randall
OK, i have managed to debug tcp traffic AND i am seeing a retransmission of the initial GET request from the client to the server - debugging client connections does indeed show a very short live time, so the problem SHOULD be caused by the retransmit.so... any clues on what i could try next?
jcinacio
retransmission times are typically set globally. On windows it's done with some obscure registry setting. On linux I don't know.
Toad