(edit: partially understood and worked around, see comment)
I have a setup with nginx acting as a reverse proxy in front of a CherryPy app server. I'm using ab to compare performance going through nginx vs. not, and noticing that the former case has much worse worst-case performance:
$ ab -n 200 -c 10 'http://localhost/noop'
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests
Server Software: nginx
Server Hostname: localhost
Server Port: 80
Document Path: /noop
Document Length: 0 bytes
Concurrency Level: 10
Time taken for tests: 3.145 seconds
Complete requests: 200
Failed requests: 0
Write errors: 0
Total transferred: 29600 bytes
HTML transferred: 0 bytes
Requests per second: 63.60 [#/sec] (mean)
Time per request: 157.243 [ms] (mean)
Time per request: 15.724 [ms] (mean, across all concurrent requests)
Transfer rate: 9.19 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 5 48 211.7 31 3007
Waiting: 5 48 211.7 31 3007
Total: 5 48 211.7 31 3007
Percentage of the requests served within a certain time (ms)
50% 31
66% 36
75% 39
80% 41
90% 46
95% 51
98% 77
99% 252
100% 3007 (longest request)
$ ab -n 200 -c 10 'http://localhost:8080/noop'
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests
Server Software: CherryPy/3.2.0
Server Hostname: localhost
Server Port: 8080
Document Path: /noop
Document Length: 0 bytes
Concurrency Level: 10
Time taken for tests: 0.564 seconds
Complete requests: 200
Failed requests: 0
Write errors: 0
Total transferred: 27600 bytes
HTML transferred: 0 bytes
Requests per second: 354.58 [#/sec] (mean)
Time per request: 28.202 [ms] (mean)
Time per request: 2.820 [ms] (mean, across all concurrent requests)
Transfer rate: 47.79 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.7 0 11
Processing: 6 26 23.5 24 248
Waiting: 3 25 23.6 23 248
Total: 6 26 23.4 24 248
Percentage of the requests served within a certain time (ms)
50% 24
66% 27
75% 29
80% 31
90% 34
95% 40
98% 51
99% 234
100% 248 (longest request)
What could be causing this? The only thing I can think of is that nginx is sending requests to the backend in a different order than they arrived, but that seems implausible.
The machine is an EC2 c1.medium instance with 2 cores, CherryPy is using a thread pool with 10 threads, and nginx has worker_connections = 1024.
UPDATE: Two more confusing findings:
- At a given concurrency, sending more requests improves performance. With a concurrency of 40 and 40 requests, I get a median time of 3s and max 10.5s; with a concurrency of 40 and 200 requests, I get a median of 38ms (!) and max 7.5s. In fact, the total time is less for 200 requests! (6.5s vs. 7.5s for 40). This is all repeatable.
- Monitoring both of the nginx worker processes with strace greatly improves their performance, taking e.g. a median time of 3s to 77ms, without noticeably changing its behavior. (I tested with a nontrivial API call and confirmed that strace doesn't change the response, as well as all of these performance observations still holding.) This is also repeatable.
wrtorhttperf?abis slow and buggy.listen(2)parameter) in CherryPy changes the behavior, and using a UNIX domain socket between nginx and CherryPy eliminates the problem (I've gone with this solution).