-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Describe the bug
While benchmarking our app against a similar Heroku and AWS stack during an AWS migration we encountered > 20X latencies on AWS compared to what we were getting on Heroku. Our AWS setup consists of Application load balancer + ECS + Fargate. We have ensured there's no CPU or memory contention on the Fargate instance and still Puma is responding to requests in a painfully slow manner when hit with higher concurrency. After several days of debugging we have come to the conclusion that the load balancer keep alive setting is causing the significant differences in latency. We were eventually able to reproduce the issue by hitting the public IP of the container directly with (HTTP/1.1) and without (HTTP/1.0) keep alive. Below are a couple of load tests showing the difference in latencies:
$ ab -n 5000 -c 100 http://xx.xx.xx.xx:9292/health/ping
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Document Path: /health/ping
Document Length: 61 bytes
Concurrency Level: 100
Time taken for tests: 6.857 seconds
Complete requests: 5000
Failed requests: 0
Total transferred: 2720000 bytes
HTML transferred: 305000 bytes
Requests per second: 729.16 [#/sec] (mean)
Time per request: 137.144 [ms] (mean)
Time per request: 1.371 [ms] (mean, across all concurrent requests)
Transfer rate: 387.37 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 44 58 28.0 48 1074
Processing: 46 76 20.1 69 283
Waiting: 46 76 20.1 69 280
Total: 93 134 38.8 121 1174
Percentage of the requests served within a certain time (ms)
50% 121
66% 133
75% 144
80% 154
90% 188
95% 206
98% 228
99% 245
100% 1174 (longest request)
$ ab -k -n 5000 -c 100 http://xx.xx.xx.xx:9292/health/ping
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Document Path: /health/ping
Document Length: 61 bytes
Concurrency Level: 100
Time taken for tests: 115.548 seconds
Complete requests: 5000
Failed requests: 0
Keep-Alive requests: 4521
Total transferred: 2828504 bytes
HTML transferred: 305000 bytes
Requests per second: 43.27 [#/sec] (mean)
Time per request: 2310.952 [ms] (mean)
Time per request: 23.110 [ms] (mean, across all concurrent requests)
Transfer rate: 23.91 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 6 21.1 0 1044
Processing: 45 2203 6208.9 47 22969
Waiting: 45 2203 6208.9 47 22969
Total: 45 2209 6223.8 47 23013
Percentage of the requests served within a certain time (ms)
50% 47
66% 47
75% 48
80% 48
90% 12627
95% 21117
98% 22017
99% 22520
100% 23013 (longest request)
Puma config:
# frozen_string_literal: true
max_threads_count = ENV.fetch("RAILS_MAX_THREADS", 1)
min_threads_count = ENV.fetch("RAILS_MIN_THREADS") { max_threads_count }
threads min_threads_count, max_threads_count
# Specifies the `port` that Puma will listen on to receive requests; default is 3002.
port ENV.fetch("PORT", 3002)
environment ENV.fetch("RAILS_ENV", "development")
workers ENV.fetch("WEB_CONCURRENCY", 1)
preload_app!
On the Fargate container below is the value of environment variables:
RAILS_MAX_THREADS=1
PORT=9292
RAILS_ENV=production
WEB_CONCURRENCY=2 # Fargate container has 2vCPUs
To Reproduce
I was able to reproduce the issue on a simple rack app specified in the bug template.
WEB_CONCURRENCY=2 RAILS_MAX_THREADS=1 PORT=9292 RAILS_ENV=production bundle exec puma -C puma.rb hello.ru
# Gemfile
source 'https://rubygems.org'
gem 'puma'
gem 'rack'
# Gemfile.lock
GEM
remote: https://rubygems.org/
specs:
nio4r (2.7.3)
puma (6.4.2)
nio4r (~> 2.0)
rack (3.1.7)
PLATFORMS
arm64-darwin-23
ruby
DEPENDENCIES
puma
rack
BUNDLED WITH
2.5.14
# puma.rb
# frozen_string_literal: true
max_threads_count = ENV.fetch("RAILS_MAX_THREADS", 1)
min_threads_count = ENV.fetch("RAILS_MIN_THREADS") { max_threads_count }
threads min_threads_count, max_threads_count
port ENV.fetch("PORT", 3002)
environment ENV.fetch("RAILS_ENV", "development")
workers ENV.fetch("WEB_CONCURRENCY", 1)
preload_app!
# hello.ru
run lambda { |env| [200, {"Content-Type" => "text/plain"}, ["Hello World"]] }
Below are the apache bench results.
Without keep alive:
$ ab -n 5000 -c 100 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests
Server Software:
Server Hostname: 0.0.0.0
Server Port: 9292
Document Path: /
Document Length: 11 bytes
Concurrency Level: 100
Time taken for tests: 0.199 seconds
Complete requests: 5000
Failed requests: 0
Total transferred: 380000 bytes
HTML transferred: 55000 bytes
Requests per second: 25158.37 [#/sec] (mean)
Time per request: 3.975 [ms] (mean)
Time per request: 0.040 [ms] (mean, across all concurrent requests)
Transfer rate: 1867.22 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.7 0 3
Processing: 1 4 0.6 4 8
Waiting: 0 4 0.6 4 8
Total: 3 4 0.9 4 9
Percentage of the requests served within a certain time (ms)
50% 4
66% 4
75% 4
80% 4
90% 4
95% 6
98% 8
99% 8
100% 9 (longest request)
With keep alive:
$ ab -k -n 5000 -c 100 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests
Server Software:
Server Hostname: 0.0.0.0
Server Port: 9292
Document Path: /
Document Length: 11 bytes
Concurrency Level: 100
Time taken for tests: 10.171 seconds
Complete requests: 5000
Failed requests: 0
Keep-Alive requests: 4511
Total transferred: 488264 bytes
HTML transferred: 55000 bytes
Requests per second: 491.58 [#/sec] (mean)
Time per request: 203.424 [ms] (mean)
Time per request: 2.034 [ms] (mean, across all concurrent requests)
Transfer rate: 46.88 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.5 0 6
Processing: 0 103 818.5 0 10042
Waiting: 0 103 818.5 0 10042
Total: 0 103 818.5 0 10042
Percentage of the requests served within a certain time (ms)
50% 0
66% 0
75% 0
80% 0
90% 21
95% 22
98% 46
99% 5129
100% 10042 (longest request)
You can clearly see that the average latency is exponentially high with HTTP/1.1 or keep alive option.
Expected behavior
Comparable latencies with or without keep-alive, or in fact better latencies with keep-alive option.
Desktop (please complete the following information):
- OS: Mac
- Puma Version: 6.4.2