Skip to content

Conversation

marshall-lee
Copy link
Contributor

@marshall-lee marshall-lee commented Aug 24, 2025

SSL_shutdown() call may result in error and in OpenSSL every error should be consumed by the valler via SSL_get_error() or a similar func.

MiniSSL implementation was breaking this convention so it resulted in cryptic bugs like SSL clients (not related to Puma in any way) return errors with text like "SSL_read: shutdown while in init" that is initially generated by the Puma reactor but not consumed by it.

Such bugs are really hard to debug...

Description

Thank you for contributing! You're the best.

We can read your code, so consider leaving some comments here that are more about your motivations and decision making. Some things that may be helpful to address in your description:

  • What original problem led to this PR?
  • Are there related issues / prior discussions?
  • What alternatives have been tried? Does this supersede previous attempts?
  • Why do you make the choices you did? What are the tradeoffs?

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

@marshall-lee marshall-lee force-pushed the fix-ssl-shutdown-error-handling branch from a177583 to ce29e27 Compare August 24, 2025 19:25
SSL_shutdown() call may result in error and in OpenSSL every error
should be consumed by the valler via SSL_get_error() or a similar func.

MiniSSL implementation was breaking this convention so it resulted
in cryptic bugs like SSL clients (not related to Puma in any way)
return errors with text like "SSL_read: shutdown while in init" that
is initially generated by the Puma reactor but not consumed by it.

Such bugs are really hard to debug...
@marshall-lee marshall-lee force-pushed the fix-ssl-shutdown-error-handling branch from ce29e27 to c683c6c Compare August 24, 2025 19:38
@github-actions github-actions bot added the waiting-for-review Waiting on review from anyone label Aug 24, 2025
@schneems
Copy link
Contributor

Thanks for the PR! The C extension is probably the hardest area to modify, if you can give us some help validating your change I think this looks like a good quality of life improvement:

Can you give me an example of a change in error output before and after? Ideally, if you have a reproduction that lets me exercise and view the error on my own that would be even better https://www.codetriage.com/reproduction.

I tried pulling in the added tests into master/main I'm unable to use them to reproduce an error as they rely on additions from this PR:

  1) Error:
TestPumaServerSSLClientCloseError#test_client_close_raises_ssl_error_in_http10:
NoMethodError: undefined method 'errors' for an instance of SSLLogWriterHelper
    test/test_puma_server_ssl.rb:77:in 'TestPumaServerSSLClientCloseError#assert_ssl_client_error_match'
    test/test_puma_server_ssl.rb:83:in 'TestPumaServerSSLClientCloseError#test_client_close_raises_ssl_error_in_http10'

  2) Error:
TestPumaServerSSLClientCloseError#test_client_both_read_and_close_raise_ssl_error:
NoMethodError: undefined method 'errors' for an instance of SSLLogWriterHelper
    test/test_puma_server_ssl.rb:77:in 'TestPumaServerSSLClientCloseError#assert_ssl_client_error_match'
    test/test_puma_server_ssl.rb:92:in 'TestPumaServerSSLClientCloseError#test_client_both_read_and_close_raise_ssl_error'

26 runs, 48 assertions, 0 failures, 2 errors, 1 skips

Does this change modify or break an API?

@marshall-lee
Copy link
Contributor Author

@schneems

I tried pulling in the added tests into master/main I'm unable to use them to reproduce an error as they rely on additions from this PR:
Does this change modify or break an API?

No just cherry pick the change in test/helper.rb as well

@marshall-lee
Copy link
Contributor Author

@schneems

Can you give me an example of a change in error output before and after? Ideally, if you have a reproduction that lets me exercise and view the error on my own that would be even better https://www.codetriage.com/reproduction.

Well it's tricky.

Before this change, Puma was not printing anything in the logs if SSL_shutdown() resulted in error because the error is handled.

However, the only bug I was able to catch is when SSL_shutdown() being call too early when the SSL connection is not fully established. This is so called "shutdown while in init" error but in this PR it's also ruled out by SSL_in_init() check. So with this PR the result is the same: "shutdown while in init" will never be printed in the logs.
But I think it's legitimate. Printing an error like "shutdown while in init" is too verbose, lets not do this anyway, it's would be a real noise in the logs. Better to silence it with a legitimate SSL_in_init() check.

To reproduce at least something you could apply the following patch on current master and run some tests:

diff --git a/ext/puma_http11/mini_ssl.c b/ext/puma_http11/mini_ssl.c
index 2af28073..5e104541 100644
--- a/ext/puma_http11/mini_ssl.c
+++ b/ext/puma_http11/mini_ssl.c
@@ -663,11 +663,21 @@ VALUE engine_shutdown(VALUE self) {
   ERR_clear_error();
 
   ok = SSL_shutdown(conn->ssl);
-  if (ok == 0) {
+  // See
+  // https://github.com/openssl/openssl/blob/openssl-3.5.2/ssl/ssl_lib.c#L2792-L2797
+  // for description of SSL_shutdown return values.
+  switch (ok) {
+  case 0:
+    // "close notify" alert is sent by us.
     return Qfalse;
+  case 1:
+    // "close notify" alert was received from peer.
+    return Qtrue;
+  default:
+    raise_error(conn->ssl, ok);
   }
 
-  return Qtrue;
+  return Qnil;
 }
 
 VALUE engine_init(VALUE self) {

This is nearly the same as changes from my PR except the SSL_in_init check. I expect you to see a bunch of shutdown while in init failures in the tests.

@marshall-lee
Copy link
Contributor Author

Besides of "shutdown while in init" (which this PR makes silent by SSL_in_init() check) I don't know any other way how to make SSL_shutdown() return an error. So i'm sorry, it's not easy to break it down into before / after behavior.

@marshall-lee
Copy link
Contributor Author

marshall-lee commented Aug 26, 2025

Well one could ask why do all this if nothing was changed anyway in terms of what error are reported in the logs.

The problem is some OpenSSL library call resulted in error, it's a caller responsibility to consume this error from the error queue.
See the relevant piece of code from Ruby openssl gem: https://github.com/ruby/openssl/blob/v3.3.0/ext/openssl/ossl_ssl.c#L2170-L2176
Puma's raise_error function does this too: https://github.com/puma/puma/blob/v6.6.1/ext/puma_http11/mini_ssl.c#L585

@marshall-lee
Copy link
Contributor Author

marshall-lee commented Aug 26, 2025

If you don't consume the OpenSSL error from the queue, it might break the SSL_get_error() in other places.
For example, SSLSocket#read from openssl gem is affected: https://github.com/ruby/openssl/blob/v3.3.0/ext/openssl/ossl_ssl.c#L1964-L1973

From the SSL_get_error docs https://docs.openssl.org/master/man3/SSL_get_error/:

In addition to ssl and ret, SSL_get_error() inspects the current thread's OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably.

@marshall-lee
Copy link
Contributor Author

@schneems

Two tests I added in TestPumaServerSSLClientCloseError don't really reproduce the root issue I faced.

I added them just because I also introduced close_client_safely method that handles client.close exceptions. I had to do this because I inserted raise_error into the client.close internals.

@MSP-Greg
Copy link
Member

I had to do this because I inserted raise_error into the client.close internals.

Yes, I didn't notice that right away. This PR seems correct. Reviewing OpenSSL man pages, may add a few tests to show that nothing is logged.

@MSP-Greg
Copy link
Member

Maybe add two tests to TestPumaServerSSL?

def test_http_10_close_no_errors
  start_server

  assert_equal 'https', send_http_read_response(GET_10, ctx: new_ctx).body

  assert_empty @log_stderr.string
end

def test_http_11_close_no_errors
  start_server

  skt = send_http ctx: new_ctx

  assert_equal 'https', skt.read_response.body
  skt.close

  assert_empty @log_stderr.string
end

Also, rather than extract the code to close_client_safely, maybe the following (with 2 added ssl related lines)?

begin
  client.close if close_socket
rescue IOError, SystemCallError
  # Already closed
rescue MiniSSL::SSLError => e
  @log_writer.ssl_error e, client.io
rescue StandardError => e
  @log_writer.unknown_error e, nil, "Client"
end

@marshall-lee
Copy link
Contributor Author

marshall-lee commented Aug 26, 2025

@MSP-Greg

Also, rather than extract the code to close_client_safely, maybe the following (with 2 added ssl related lines)?

It was extracted cause it has more than one call sites

@marshall-lee
Copy link
Contributor Author

marshall-lee commented Aug 26, 2025

@MSP-Greg

Added both test_http_10_close_no_errors and test_http_11_close_no_errors in 0026b95

@MSP-Greg
Copy link
Member

It was extracted cause it has more than one call sites

Sorry, also missed that. Jumping around a bit today...

Can it be added to Client.close? It's an internal object, and its already got the io wrapped in a rescue block?`

@MSP-Greg
Copy link
Member

MSP-Greg commented Aug 26, 2025

JFYI, as to my missing that, the new HTTP_10 test is called from Server#process_client, and the HTTP_11 test is called from Server#reactor_wakeup. The two call locations...

Thanks for adding the tests.

@MSP-Greg
Copy link
Member

@marshall-lee

Please ignore the idea of moving code from Server#close_client_safely to Client.close. Client (currently) doesn't have access the log writer...

@github-actions github-actions bot added waiting-for-merge and removed waiting-for-review Waiting on review from anyone labels Aug 26, 2025
@MSP-Greg
Copy link
Member

@schneems

Not sure what you checked. Some of this code is from 25-July-2016, OpenSSL 1.0.2h was the current version. I reviewed this with a few current OpenSSL man pages (see https://docs.openssl.org/master/man3/SSL_shutdown/).

LGTM.

@MSP-Greg MSP-Greg added the bug label Aug 27, 2025
@marshall-lee
Copy link
Contributor Author

@schneems @MSP-Greg Hi! Is there anything else I can do for this to be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants