Puma serves requests with significantly higher latency on HTTP/1.1 (keepalive) compared to HTTP/1.0

**Describe the bug**
While benchmarking our app against a similar Heroku and AWS stack during an AWS migration we encountered > 20X latencies on AWS compared to what we were getting on Heroku. Our AWS setup consists of Application load balancer + ECS + Fargate. We have ensured there's no CPU or memory contention on the Fargate instance and still Puma is responding to requests in a painfully slow manner when hit with higher concurrency. After several days of debugging we have come to the conclusion that the load balancer keep alive setting is causing the significant differences in latency. We were eventually able to reproduce the issue by hitting the public IP of the container directly with (HTTP/1.1) and without (HTTP/1.0) keep alive. Below are a couple of load tests showing the difference in latencies:
```
$ ab -n 5000 -c 100 http://xx.xx.xx.xx:9292/health/ping          
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Document Path:          /health/ping
Document Length:        61 bytes

Concurrency Level:      100
Time taken for tests:   6.857 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      2720000 bytes
HTML transferred:       305000 bytes
Requests per second:    729.16 [#/sec] (mean)
Time per request:       137.144 [ms] (mean)
Time per request:       1.371 [ms] (mean, across all concurrent requests)
Transfer rate:          387.37 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       44   58  28.0     48    1074
Processing:    46   76  20.1     69     283
Waiting:       46   76  20.1     69     280
Total:         93  134  38.8    121    1174

Percentage of the requests served within a certain time (ms)
  50%    121
  66%    133
  75%    144
  80%    154
  90%    188
  95%    206
  98%    228
  99%    245
 100%   1174 (longest request)

$ ab -k -n 5000 -c 100 http://xx.xx.xx.xx:9292/health/ping
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Document Path:          /health/ping
Document Length:        61 bytes

Concurrency Level:      100
Time taken for tests:   115.548 seconds
Complete requests:      5000
Failed requests:        0
Keep-Alive requests:    4521
Total transferred:      2828504 bytes
HTML transferred:       305000 bytes
Requests per second:    43.27 [#/sec] (mean)
Time per request:       2310.952 [ms] (mean)
Time per request:       23.110 [ms] (mean, across all concurrent requests)
Transfer rate:          23.91 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    6  21.1      0    1044
Processing:    45 2203 6208.9     47   22969
Waiting:       45 2203 6208.9     47   22969
Total:         45 2209 6223.8     47   23013

Percentage of the requests served within a certain time (ms)
  50%     47
  66%     47
  75%     48
  80%     48
  90%  12627
  95%  21117
  98%  22017
  99%  22520
 100%  23013 (longest request)
```

**Puma config:**

```
# frozen_string_literal: true

max_threads_count = ENV.fetch("RAILS_MAX_THREADS", 1)
min_threads_count = ENV.fetch("RAILS_MIN_THREADS") { max_threads_count }
threads min_threads_count, max_threads_count

# Specifies the `port` that Puma will listen on to receive requests; default is 3002.
port ENV.fetch("PORT", 3002)

environment ENV.fetch("RAILS_ENV", "development")

workers ENV.fetch("WEB_CONCURRENCY", 1)

preload_app!
```
On the Fargate container below is the value of environment variables:
```
RAILS_MAX_THREADS=1
PORT=9292
RAILS_ENV=production
WEB_CONCURRENCY=2 # Fargate container has 2vCPUs
```

**To Reproduce**
I was able to reproduce the issue on a simple rack app specified in the bug template. 
```
WEB_CONCURRENCY=2 RAILS_MAX_THREADS=1 PORT=9292 RAILS_ENV=production bundle exec puma -C puma.rb hello.ru

# Gemfile
source 'https://rubygems.org'

gem 'puma'
gem 'rack'

# Gemfile.lock
GEM
  remote: https://rubygems.org/
  specs:
    nio4r (2.7.3)
    puma (6.4.2)
      nio4r (~> 2.0)
    rack (3.1.7)

PLATFORMS
  arm64-darwin-23
  ruby

DEPENDENCIES
  puma
  rack

BUNDLED WITH
   2.5.14
   
 
# puma.rb

# frozen_string_literal: true

max_threads_count = ENV.fetch("RAILS_MAX_THREADS", 1)
min_threads_count = ENV.fetch("RAILS_MIN_THREADS") { max_threads_count }
threads min_threads_count, max_threads_count

port ENV.fetch("PORT", 3002)

environment ENV.fetch("RAILS_ENV", "development")

workers ENV.fetch("WEB_CONCURRENCY", 1)

preload_app!

# hello.ru

run lambda { |env| [200, {"Content-Type" => "text/plain"}, ["Hello World"]] }
```

Below are the apache bench results.

Without keep alive:
```
$ ab -n 5000 -c 100 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests


Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      100
Time taken for tests:   0.199 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      380000 bytes
HTML transferred:       55000 bytes
Requests per second:    25158.37 [#/sec] (mean)
Time per request:       3.975 [ms] (mean)
Time per request:       0.040 [ms] (mean, across all concurrent requests)
Transfer rate:          1867.22 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.7      0       3
Processing:     1    4   0.6      4       8
Waiting:        0    4   0.6      4       8
Total:          3    4   0.9      4       9

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      4
  80%      4
  90%      4
  95%      6
  98%      8
  99%      8
 100%      9 (longest request)
```

With keep alive:
```
$ ab -k -n 5000 -c 100 http://0.0.0.0:9292/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 0.0.0.0 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests


Server Software:        
Server Hostname:        0.0.0.0
Server Port:            9292

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      100
Time taken for tests:   10.171 seconds
Complete requests:      5000
Failed requests:        0
Keep-Alive requests:    4511
Total transferred:      488264 bytes
HTML transferred:       55000 bytes
Requests per second:    491.58 [#/sec] (mean)
Time per request:       203.424 [ms] (mean)
Time per request:       2.034 [ms] (mean, across all concurrent requests)
Transfer rate:          46.88 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       6
Processing:     0  103 818.5      0   10042
Waiting:        0  103 818.5      0   10042
Total:          0  103 818.5      0   10042

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%     21
  95%     22
  98%     46
  99%   5129
 100%  10042 (longest request)
```

You can clearly see that the average latency is exponentially high with HTTP/1.1 or keep alive option.

**Expected behavior**
Comparable latencies with or without keep-alive, or in fact better latencies with keep-alive option.

**Desktop (please complete the following information):**
 - OS: Mac
 - Puma Version: 6.4.2


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Puma serves requests with significantly higher latency on HTTP/1.1 (keepalive) compared to HTTP/1.0 #3443

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Puma serves requests with significantly higher latency on HTTP/1.1 (keepalive) compared to HTTP/1.0 #3443

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions