Description
See summary: https://docs.google.com/document/d/11wkyiK-bDYGOPtFfLGII7iPQxsXoYiq8f6L1fc30S4k/edit?tab=t.0#heading=h.efgyhz20f9eb
What's the symptom
When upgrading etcd from v3.5.x to v3.6.0-rc.x, the upgrade may fail due to "membership: too many learner members in cluster
", because etcd allows only 1 learner at most by default.
Thanks @neolit123 for uploading the initial log.
Which versions are impacted
Note only release-3.5 has this issue. The issue was introduced in v3.5.1
in #13348. All etcd patch versions in v3.5.1 - v3.5.19
are affected.
If you ever added & promoted learner(s) in v3.5.1 - v3.5.19
and try to upgrade from 3.5.1+ to v3.6.0-rc.x, then you will see this issue,
- If you only added & promoted one learner, you will see that the member become a learner again after upgrading to
v3.6.0-rc.x
; - if you added & promoted multiple learner (>=2), then the upgrade will fail, because the etcdserver will crash on bootstrap due to
membership: too many learner members in cluster
.
What's the root cause
When promoting a learner, the change is only persisted in v2store, not in v3store. The reason is simple, because etcd returns errMemberAlreadyExist
. Clearly, the member (learner) ID has already existed. See 3.5 code below,
etcd/server/etcdserver/api/membership/store.go
Lines 57 to 59 in 1810af3
So the membership data will be inconsistent between v2store and v3store in such case.
Why we only see this issue when upgrading from 3.5 to 3.6?
In 3.5, the v2store is the source of truth for the membership data. In 3.6, v3store (bbolt) is the source of truth for the membership data. When upgrading from 3.5.x to 3.6, the source of truth changes.
How to reproduce this issue
Manual steps
Note try the steps using 3.5.x (>=1) binary,
Step 1: start an etcd instance
$./bin/etcd --name e1 --initial-advertise-peer-urls http://127.0.0.1:2380 --listen-peer-urls http://127.0.0.1:2380 --advertise-client-urls http://127.0.0.1:2379 --listen-client-urls http://127.0.0.1:2379 --initial-cluster "e1=http://127.0.0.1:2380" --initial-cluster-state new
Step 2: add a learner in another terminal
$ ./bin/etcdctl member add e2 --peer-urls=http://127.0.0.1:2382 --learner
Step 3: start the learner
$ ./bin/etcd --name e2 --initial-advertise-peer-urls http://127.0.0.1:2382 --listen-peer-urls http://127.0.0.1:2382 --advertise-client-urls http://127.0.0.1:2378 --listen-client-urls http://127.0.0.1:2378 --initial-cluster "e1=http://127.0.0.1:2380,e2=http://127.0.0.1:2382" --initial-cluster-state existing
Step 4: promote the learner
$ ./bin/etcdctl member promote 155a4a14c50481b8
Step 5: stop both etcd instances
Step 6: check the bbolt db file directly
Using tool etcd-dump-db to check the db file. You will see that the already promoted leaner is still a leaner.
$ ./etcd-dump-db iterate-bucket ../../e1.etcd/ members
key="e610623c040f129c", value="{\"id\":16577858238256452252,\"peerURLs\":[\"http://127.0.0.1:2382\"],\"isLearner\":true}"
key="b71f75320dc06a6c", value="{\"id\":13195394291058371180,\"peerURLs\":[\"http://127.0.0.1:2380\"],\"name\":\"e1\"}"
Automatic step
Just execute upgrade_test.sh (of course, you need to download both v3.5.19 and v3.6.0-rc.2 binaries beforehand), afterwards, check the log, you will see the error message "membership: too many learner members in cluster
".
Proposed solution & actions
Proposal for release-3.5
- Fix the bug as mentioned above (pasted again below) (refer to
main
branch's implementation). Also add an e2e test to verify the membership data is consistent between v2store and v3store. Probably we should add the verification in production code, but only enabled in test.
etcd/server/etcdserver/api/membership/store.go
Lines 57 to 59 in 1810af3
- Provide
etcdutl
commands to- check the membership data differences between v2store and v3store.
- Something like
etcdutl check members --data-dir path-2-data-dir
- Something like
- sync the membership data between v2store and v3store.
- Something like
etcdutl sync members --data-dir path-2-data-dir
- Something like
- check the membership data differences between v2store and v3store.
Proposal for main & release-3.6.
Make change to main firstly, backport to release-3.6 later.
-
Add an e2e test similar to upgrade_test.sh to cover the upgrade case similar to what kubeadm does. The rough steps,
- start a one member (3.5.x) cluster
- add a learner (3.5.x), and promote it to a voting member later
- add another learner (3.5.x) again, and promote it a voting member later.
- upgrade the member to 3.6 one by one.
-
We should definitely get the issue included in the upgrade(3.5->3.6) checklist. Users should check the membership data is consistent between v3store and v2store.
-
Probably we need to publish an official announcement
cc @fuweid @ivanvc @jmhbnz @serathius @siyuanfoundation @spzala @neolit123 @dims
Next step
- Once the release-3.5 side changes are done, release etcd v3.5.20
- Once the main & release-3.6 changes are done, release etcd v3.6.0-rc.3.
- When the above two are done, bump to v3.5.20 in K8s. Also verify the upgrade in K8s workflow similar to Testing Upgrade etcd to v3.6.0-rc.1 kubernetes/kubernetes#130583
Let's discuss this next Monday, and who work on what.
PRs
- [release-3.5] Fix the learner promotion changes not being persisted into v3store (bbolt) #19563
- [release-3.5] [Solution 1] Auto sync members in v3store if Islearner is the only field that differs between v2store and v3store #19586
- [release-3.5] [Solution 2] Auto sync members in v3store if Islearner is the only field that differs between v2store and v3store #19606
- [release-3.5] Add e2e test to verify etcd is able to automatically fix the issue #19629
- [release-3.6] Auto sync members in v3store is IsLearner differs between v2 and v3 store #19636
- e2e: add upgrade test for clusters set up by promoted members #19634