Skip to content

Conversation

@mble-sfdc
Copy link
Contributor

@mble-sfdc mble-sfdc commented Feb 21, 2025

In Ubuntu 24.04, maintainers added the SSL_CERT_FILE symlink for OpenSSL to, correctly, point /usr/lib/ss/cert.pem to the ca-certificates bundle:

$ ls -lah /usr/lib/ssl/cert.pem
lrwxrwxrwx. 1 root root 34 Aug 20  2024 /usr/lib/ssl/cert.pem -> /etc/ssl/certs/ca-certificates.crt

The downside to this is that OpenSSL prefers loading this file first if found when calling X509_STORE_set_default_paths: https://github.com/openssl/openssl/blob/a1c6e2d1b590dc6a3d2e1c7bd1bf61ffcf854104/crypto/x509/x509_d2.c#L15

This is fine for small bundles, but with large bundles, this materially degrades performance of the set_default_paths call:

$ hyperfine -w10 "ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'" "SSL_CERT_FILE=/dev/null ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'"
Benchmark 1: ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'
  Time (mean ± σ):      63.4 ms ±   1.4 ms    [User: 58.3 ms, System: 4.6 ms]
  Range (min … max):    61.0 ms …  68.8 ms    47 runs

Benchmark 2: SSL_CERT_FILE=/dev/null ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'
  Time (mean ± σ):      13.9 ms ±   0.6 ms    [User: 10.7 ms, System: 3.1 ms]
  Range (min … max):    12.3 ms …  15.4 ms    212 runs

Summary
  SSL_CERT_FILE=/dev/null ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths' ran
    4.55 ± 0.21 times faster than ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'

As a result, all cases that do not cache the result have this lookup latency added. This is especially apparent in libraries connecting to databases etc that are not using a connection pool, paying the penalty per connection setup.

Ref: GUS-W-17566700 and an internal Slack thread.

@edmorley edmorley changed the title fix: remove /usr/lib/ssl/cert.pem symlink Feb 24, 2025
In Ubuntu 24.04, maintainers added the `SSL_CERT_FILE` symlink for
OpenSSL to, correctly, point `/usr/lib/ss/cert.pem` to the
`ca-certificates` bundle:

```
$ ls -lah /usr/lib/ssl/cert.pem
lrwxrwxrwx. 1 root root 34 Aug 20  2024 /usr/lib/ssl/cert.pem -> /etc/ssl/certs/ca-certificates.crt
```

The downside to this is that OpenSSL prefers loading this file first if
found when calling `X509_STORE_set_default_paths`: https://github.com/openssl/openssl/blob/a1c6e2d1b590dc6a3d2e1c7bd1bf61ffcf854104/crypto/x509/x509_d2.c#L15

This is fine for small bundles, but with large bundles, this materially
degrades performance of the `set_default_paths` call:

```
$ hyperfine -w10 "ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'" "SSL_CERT_FILE=/dev/null ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'"
Benchmark 1: ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'
  Time (mean ± σ):      63.4 ms ±   1.4 ms    [User: 58.3 ms, System: 4.6 ms]
  Range (min … max):    61.0 ms …  68.8 ms    47 runs

Benchmark 2: SSL_CERT_FILE=/dev/null ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'
  Time (mean ± σ):      13.9 ms ±   0.6 ms    [User: 10.7 ms, System: 3.1 ms]
  Range (min … max):    12.3 ms …  15.4 ms    212 runs

Summary
  SSL_CERT_FILE=/dev/null ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths' ran
    4.55 ± 0.21 times faster than ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'
```

As a result, all cases that do not cache the result have this lookup
latency added. This is especially apparent in libraries connecting to
databases etc that are not using a connection pool, paying the penalty
per connection setup.

Ref: W-17566700
@mble-sfdc mble-sfdc force-pushed the mble-sfdc-remove-cert-bundle-symlink branch from 9ad33c4 to 0268798 Compare February 24, 2025 12:14
@mble-sfdc mble-sfdc marked this pull request as ready for review May 7, 2025 13:46
@mble-sfdc mble-sfdc requested a review from a team as a code owner May 7, 2025 13:46
@edmorley edmorley requested a review from dzuelke June 27, 2025 18:11
@edmorley
Copy link
Member

(Capturing context for future reference)

In Ubuntu 24.04, maintainers added the SSL_CERT_FILE symlink for OpenSSL to, correctly, point /usr/lib/ss/cert.pem to the ca-certificates bundle:

openssl (3.0.5-3) unstable; urgency=medium

  * Add cert.pem symlink pointing to ca-certificates' ca-certificates.crt
   (Closes: #805646).

See:

@edmorley
Copy link
Member

Example performance impact on HTTP requests made by an internal app when upgrading from Heroku-22 (which doesn't have the symlink) to Heroku-24 (which does) - Heroku-24 is the time-frame in blue:

image
@edmorley
Copy link
Member

edmorley commented Jul 29, 2025

Misc notes copied from an internal document:

OpenSSL::X509::Store.new.set_default_paths

loads:

OpenSSL::X509::DEFAULT_CERT_FILE => /usr/lib/ssl/cert.pem
OpenSSL::X509::DEFAULT_CERT_DIR => /usr/lib/ssl/certs -> /etc/ssl/certs
Path/File Ubuntu 22.04 Heroku-22 Ubuntu 24.04 Heroku-24
DEFAULT_CERT_FILE n/a n/a /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
DEFAULT_CERT_DIR /etc/ssl/certs /etc/ssl/certs /etc/ssl/certs /etc/ssl/certs
/etc/ssl/certs 294 certs 495 certs 294 certs 495 certs
/etc/ssl/certs/ca-certificates.crt 146 certs 246 certs 146 certs 246 certs
load duration 1ms 2ms 10ms 30ms

Cert counts calculated using:

cat /etc/ssl/certs/ca-certificates.crt | grep "BEGIN CERTIFICATE" | wc -l
ls -l /etc/ssl/certs | wc -l
@edmorley
Copy link
Member

edmorley commented Jul 29, 2025

To summarise my understanding of the situation:

  • OpenSSL's X509_STORE_set_default_paths attempts to load certs from multiple sources, including:
    • a certs file (/usr/lib/ssl/cert.pem unless overridden by the env var SSL_CERT_FILE)
    • a certs directory (/etc/ssl/certs unless overridden by the env var SSL_CERT_DIR)
  • For Ubuntu 22.04 and older, /usr/lib/ssl/cert.pem didn't exist, meaning certs were only loaded from /etc/ssl/certs.
  • Starting in Ubuntu 24.04, /usr/lib/ssl/cert.pem now exists as a symlink to /etc/ssl/certs/ca-certificates.crt, which was added in order to try and fix upstream Debian bug #805646. (Though reading that bug report, it seems there were other solutions there and maybe this wasn't the most appropriate fix?)
  • This now means that set_default_paths() loads certs twice, once from ca-certificates.crt and once from /etc/ssl/certs
  • This not only causes duplication (there will be overlap between the two), but more significantly OpenSSL is much slower at loading the certs file, since (roughly; I'm hand-waving here) it has to load the whole file, whereas the cert dir design allows for a much faster ~hash based lookup of individual certs on a per request basis
  • For use-cases where the result of set_default_paths isn't cached (eg the client/session isn't re-used, or the client calls set_default_paths for every request) this then causes significant performance regressions, that are in some cases hard to work around without re-architecting the app (eg not all app architectures or languages easily allow for re-using a global client/session).
  • This performance issue exists for stock Ubuntu 24.04 with ca-certificates installed, but is made worse for Ubuntu 24.04 based images where additional certs have been added - such as the RDS certs we added to all Heroku stacks in Add the AWS RDS cert bundles #329 (which increases the number of certs in ca-certificates.crt from 146 to 246).
  • While individual apps can resolve the perf issue by either re-using the client/session (which is a best practice and can give a significant performance boost in general), or by setting the env var SSL_CERT_FILE to the empty string to prevent the loading of the CA certs file, this requires that users know they are running into this issue (e.g. by discovering https://devcenter.heroku.com/articles/heroku-24-stack#changes-to-openssl-certificate-loading-performance). Plus switching to a shared client/session isn't possible for all use-cases as mentioned above.
@mble-sfdc
Copy link
Contributor Author

@edmorley something worth noting that setting SSL_CERT_FILE=/dev/null doesn't play nice in some circumstances:

$ SSL_CERT_FILE=/dev/null irb
irb(main):001> require "excon"
=> true
irb(main):002> client = Excon.new("https://www.google.com/")
=> 
#<Excon::Connection:197a8 @data={:chunk_size=>1048576, :ciphers=>"ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA38...
irb(main):003> client.options
/Users/matthew.blewitt/.rbenv/versions/3.3.6/lib/ruby/gems/3.3.0/gems/excon-0.98.0/lib/excon/ssl_socket.rb:139:in `initialize': SSL_CTX_load_verify_file: no certificate or crl found (OpenSSL::SSL::SSLError) (Excon::Error::Socket)
@edmorley
Copy link
Member

@mble-sfdc Ah I didn't know - yet another reason that workaround isn't really viable. Thanks!

Copy link
Member

@edmorley edmorley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

In general we try to deviate as little as possible from stock Ubuntu, however, in this case given:
(a) the severity of the impact (for apps that don't reuse clients, which will be a fair number of them),
(b) the fact that as a PaaS we don't control end user app design,
(c) users that run into this may struggle to debug the issue and may miss the Heroku-24 docs explaining it (and think it's a Heroku issue, rather than a wider Ubuntu/Debian/OpenSSL issue)
(d) the fact that even if we get upstream Ubuntu/Debian to fix this, it's presumably unlikely they'll backport it to Ubuntu 24.04 given how long ago it landed upstream (2022)

...then I don't see much alternative to us overriding the Ubuntu defaults here and removing the symlink.

There is a risk that some Heroku-24 using apps are relying on this new behaviour, however:

  1. I think that the risk of removing the symlink is small given how many years Ubuntu hasn't had this symlink,
  2. Any affected apps could presumably set SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt to restore the equivalent of the behaviour when the symlink is present
  3. We don't really have much other choice (ie: removing the symlink is the least worst option)

As such, I agree we should make this change.

Signed-off-by: Ed Morley <501702+edmorley@users.noreply.github.com>
@edmorley edmorley merged commit 6fd50f2 into heroku:main Jul 29, 2025
19 checks passed
@edmorley
Copy link
Member

edmorley commented Jul 29, 2025

Release started (currently only on staging cloud, and Docker images):
https://github.com/heroku/base-images/releases/tag/v156

Testing the old vs new Docker image on my local machine (MacBook Pro laptop with M4 Max):

$ for v in v155 v156; do docker run --rm -itq --user root "heroku/heroku:24.${v}" bash -c "echo '## ${v}' && apt-get update &>/dev/null && apt-get install -yqq --no-install-recommends ruby hyperfine &>/dev/null && hyperfine -w10 \"ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'\""; done
## v155
Benchmark 1: ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'
  Time (mean ± σ):      40.7 ms ±   0.9 ms    [User: 38.7 ms, System: 2.0 ms]
  Range (min … max):    39.6 ms …  45.7 ms    73 runs

## v156
Benchmark 1: ruby --disable='gems,did_you_mean,rubyopt' -ropenssl -e 'OpenSSL::X509::Store.new.set_default_paths'
  Time (mean ± σ):       8.5 ms ±   0.1 ms    [User: 6.8 ms, System: 1.6 ms]
  Range (min … max):     8.1 ms …   8.9 ms    345 runs
@edmorley
Copy link
Member

edmorley commented Aug 1, 2025

Interestingly, I just came across https://fedoraproject.org/wiki/Changes/dropingOfCertPemFile which describes how newer Fedora versions will be removing their /etc/pki/tls/cert.pem file/symlink in order to force usage of the directory-hash format by default. (Discovered via pypa/pip#13517 / sethmlarson/truststore#183).

So it seems perhaps this is the direction distros should be moving, and Debian/Ubuntu's addition of the symlink was indeed a step backwards.

@edmorley
Copy link
Member

edmorley commented Aug 2, 2025

If anyone encounters errors like SSL_CTX_load_verify_file: system lib, it means your app is overriding the OpenSSL default CA certificate file location to point at the now non-existent path. Check for references to ssl_ca_file, ca_file or OpenSSL::X509::DEFAULT_CERT_FILE and remove them.

In general the CA certificates file and directory locations shouldn’t be hardcoded at the application level, and instead the default library/OS settings used instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants