-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Add growing member test scenario #20239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nwnt The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@serathius Not sure if this is the same as what you have in mind. I probably missed a lot of things so definitely would love your feedback about it. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #20239 +/- ##
=======================================
Coverage ? 69.44%
=======================================
Files ? 418
Lines ? 34696
Branches ? 0
=======================================
Hits ? 24096
Misses ? 9219
Partials ? 1381 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
| MemberReplace Failpoint = memberReplace{} | ||
| MemberDowngrade Failpoint = memberDowngrade{} | ||
| MemberDowngradeUpgrade Failpoint = memberDowngradeUpgrade{} | ||
| GrowMember Failpoint = &growMember{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not growing members here :P
Usually talk about "cluster" and not "members". And we call the process "scale up". Maybe ClusterScaleUp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about ClusterScaleInAndOut because we're doing both? :)
| } | ||
|
|
||
| type growMember struct { | ||
| currentSize int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which "current" you mean? Maybe initialSize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I no longer need this initialSize. It's now scalingTarget.
| } | ||
|
|
||
| func (f *growMember) Available(config e2e.EtcdProcessClusterConfig, member e2e.EtcdProcess, profile traffic.Profile) bool { | ||
| return f.currentSize < f.targetSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the tricky part. If you start from cluster of size 1, then the traffic will talk to the 1 initial member. I would recommend instead to start to from 3 members, remove 2 and add back 2. This is still tricky because StartNewProc will allocate unique port numbers so we will need to work around that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. How about the current one?
|
Good work on the implementation, implementing a new failpoint is pretty tricky. Got an error Problem we are hitting with Unfortunately to merge this feature we will need to rewrite |
|
With #20435 we might be able test it again. Can you rebase and see what errors we get? I assume we might need to change how errors are handled when reading WAL. |
1d4a309 to
86a87df
Compare
Signed-off-by: Nont <nont@duck.com>
86a87df to
17b929e
Compare
|
From |
|
From |
|
/retest |
1 similar comment
|
/retest |
|
/retest pull-etcd-robustness-arm64 |
|
@serathius: The Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-etcd-robustness-arm64 |
|
@nwnt: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Should I remove each process's data directory after scaling down during |
This looks like the condition I should add to |
No, please don't read random logs.
No, this log is related to fact that membership are rejected if there is suspicion that cluster member are unstable. It requires at least 5 seconds of being connected to all members. This could be mitigated by adding 5 seconds sleep before membership changes. |
|
Please read MemberReplace failpoint, seems there is a some overlap with your case. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
For #20137