Skip to content

Conversation

@easwars
Copy link
Contributor

@easwars easwars commented Jun 9, 2025

The xDS client is currently making an incorrect assumption that can lead to panics.

When an xDS client authority receives a response from a server, it checks to see if the response is from the currently active channel. If not, it assumes that it is from a higher priority server (because when a response is received from a server, all lower priority servers are closed), and therefore expects that a channel to that server exists. This need not be true because the responses are not handled inline, but are handled in the serializer. Here is a possible sequence:

  • Authority contains servers A, B, C (in priority order).
  • Let's say A and B are DOWN, and C is UP. Therefore the active channel is now C.
  • At time T0, the authority receives a message from server B. This is not handled inline, but is queued in the serializer for handling.
  • At time T1, the authority receives a message from server C. This is not handled inline, but is queued in the serializer for handling.
  • At time T2, the serializer handles the message from server B and as part of this handling, releases the reference to server C as it has now seen a response from a higher priority server. Also, the active channel now becomes B.
  • At time T3, the serializer handles the message from server C, and implicitly (wrongly) assumes that since it is not from the currently active channel B, it must be from a higher priority server and expects a channel to be present for that server.

This fix ensures what when the authority receives a response from a server that is not currently the active channel, it must ensure that it is from a higher priority server before proceeding with handling that response. If it is not from a higher priority server, it should drop the response on the floor and move on.

RELEASE NOTES:

  • xdsclient: Fixed a rare panic caused by processing a response from a closed server.
@easwars easwars requested a review from arjan-bal June 9, 2025 18:16
@easwars
Copy link
Contributor Author

easwars commented Jun 9, 2025

@easwars easwars added this to the 1.74 Release milestone Jun 9, 2025
@easwars
Copy link
Contributor Author

easwars commented Jun 9, 2025

@purnesh42H : FYI since the migration PR got rolled back. If this goes in before that eventual roll-forward, we need to ensure that we don't lose track of this fix. Thanks.

@codecov
Copy link

codecov bot commented Jun 9, 2025

Codecov Report

❌ Patch coverage is 27.77778% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.39%. Comparing base (7238ab1) to head (7883084).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
xds/internal/clients/xdsclient/authority.go 27.77% 10 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8389      +/-   ##
==========================================
- Coverage   82.48%   82.39%   -0.10%     
==========================================
  Files         413      413              
  Lines       40511    40518       +7     
==========================================
- Hits        33417    33384      -33     
- Misses       5737     5773      +36     
- Partials     1357     1361       +4     
Files with missing lines Coverage Δ
xds/internal/clients/xdsclient/authority.go 76.93% <27.77%> (-2.50%) ⬇️

... and 28 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Copy link
Contributor

@arjan-bal arjan-bal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@easwars easwars merged commit 8b61e8f into grpc:master Jul 29, 2025
15 checks passed
@easwars easwars deleted the xdsclient_crash branch July 29, 2025 23:26
dimpavloff pushed a commit to dimpavloff/grpc-go that referenced this pull request Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

3 participants