[release-3.5] Learner promotion isn't persisted to v3store (bbolt)

@neolit123

See summary: https://docs.google.com/document/d/11wkyiK-bDYGOPtFfLGII7iPQxsXoYiq8f6L1fc30S4k/edit?tab=t.0#heading=h.efgyhz20f9eb

What's the symptom

When upgrading etcd from v3.5.x to v3.6.0-rc.x, the upgrade may fail due to "membership: too many learner members in cluster", because etcd allows only 1 learner at most by default.

Thanks @neolit123 for uploading the initial log.

Which versions are impacted

Note only release-3.5 has this issue. The issue was introduced in v3.5.1 in #13348. All etcd patch versions in v3.5.1 - v3.5.19 are affected.

If you ever added & promoted learner(s) in v3.5.1 - v3.5.19 and try to upgrade from 3.5.1+ to v3.6.0-rc.x, then you will see this issue,

If you only added & promoted one learner, you will see that the member become a learner again after upgrading to v3.6.0-rc.x;
if you added & promoted multiple learner (>=2), then the upgrade will fail, because the etcdserver will crash on bootstrap due to membership: too many learner members in cluster.

What's the root cause

When promoting a learner, the change is only persisted in v2store, not in v3store. The reason is simple, because etcd returns errMemberAlreadyExist. Clearly, the member (learner) ID has already existed. See 3.5 code below,

etcd/server/etcdserver/api/membership/store.go

Lines 57 to 59 in 1810af3

    
           if unsafeMemberExists(tx, mkey) { 
        
           	return errMemberAlreadyExist 
        
           }

So the membership data will be inconsistent between v2store and v3store in such case.

Why we only see this issue when upgrading from 3.5 to 3.6?

In 3.5, the v2store is the source of truth for the membership data. In 3.6, v3store (bbolt) is the source of truth for the membership data. When upgrading from 3.5.x to 3.6, the source of truth changes.

How to reproduce this issue

Manual steps

Note try the steps using 3.5.x (>=1) binary,

Step 1: start an etcd instance

$./bin/etcd --name e1 --initial-advertise-peer-urls http://127.0.0.1:2380 --listen-peer-urls http://127.0.0.1:2380 --advertise-client-urls http://127.0.0.1:2379 --listen-client-urls http://127.0.0.1:2379 --initial-cluster "e1=http://127.0.0.1:2380" --initial-cluster-state new

Step 2: add a learner in another terminal

$ ./bin/etcdctl member add e2 --peer-urls=http://127.0.0.1:2382 --learner

Step 3: start the learner

$ ./bin/etcd --name e2 --initial-advertise-peer-urls http://127.0.0.1:2382  --listen-peer-urls http://127.0.0.1:2382 --advertise-client-urls http://127.0.0.1:2378 --listen-client-urls http://127.0.0.1:2378 --initial-cluster "e1=http://127.0.0.1:2380,e2=http://127.0.0.1:2382" --initial-cluster-state existing

Step 4: promote the learner

$ ./bin/etcdctl member promote 155a4a14c50481b8

Step 5: stop both etcd instances

Step 6: check the bbolt db file directly

Using tool etcd-dump-db to check the db file. You will see that the already promoted leaner is still a leaner.

$ ./etcd-dump-db  iterate-bucket ../../e1.etcd/ members
key="e610623c040f129c", value="{\"id\":16577858238256452252,\"peerURLs\":[\"http://127.0.0.1:2382\"],\"isLearner\":true}"
key="b71f75320dc06a6c", value="{\"id\":13195394291058371180,\"peerURLs\":[\"http://127.0.0.1:2380\"],\"name\":\"e1\"}"

Automatic step

Just execute upgrade_test.sh (of course, you need to download both v3.5.19 and v3.6.0-rc.2 binaries beforehand), afterwards, check the log, you will see the error message "membership: too many learner members in cluster".

Proposed solution & actions

Proposal for release-3.5

Fix the bug as mentioned above (pasted again below) (refer to main branch's implementation). Also add an e2e test to verify the membership data is consistent between v2store and v3store. Probably we should add the verification in production code, but only enabled in test.

etcd/server/etcdserver/api/membership/store.go

Lines 57 to 59 in 1810af3

    
           if unsafeMemberExists(tx, mkey) { 
        
           	return errMemberAlreadyExist 
        
           }

Provide etcdutl commands to
- check the membership data differences between v2store and v3store.
  - Something like etcdutl check members --data-dir path-2-data-dir
- sync the membership data between v2store and v3store.
  - Something like etcdutl sync members --data-dir path-2-data-dir

Proposal for main & release-3.6.

Make change to main firstly, backport to release-3.6 later.

Add an e2e test similar to upgrade_test.sh to cover the upgrade case similar to what kubeadm does. The rough steps,
- start a one member (3.5.x) cluster
- add a learner (3.5.x), and promote it to a voting member later
- add another learner (3.5.x) again, and promote it a voting member later.
- upgrade the member to 3.6 one by one.
We should definitely get the issue included in the upgrade(3.5->3.6) checklist. Users should check the membership data is consistent between v3store and v2store.
Probably we need to publish an official announcement

cc @fuweid @ivanvc @jmhbnz @serathius @siyuanfoundation @spzala @neolit123 @dims

Next step

Once the release-3.5 side changes are done, release etcd v3.5.20
Once the main & release-3.6 changes are done, release etcd v3.6.0-rc.3.
When the above two are done, bump to v3.5.20 in K8s. Also verify the upgrade in K8s workflow similar to Testing Upgrade etcd to v3.6.0-rc.1 kubernetes/kubernetes#130583

Let's discuss this next Monday, and who work on what.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[release-3.5] Learner promotion isn't persisted to v3store (bbolt) #19557

What's the symptom

Which versions are impacted

What's the root cause

Why we only see this issue when upgrading from 3.5 to 3.6?

How to reproduce this issue

Manual steps

Automatic step

Proposed solution & actions

Proposal for release-3.5

Proposal for main & release-3.6.

Next step

PRs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if unsafeMemberExists(tx, mkey) {
	return errMemberAlreadyExist
	}

[release-3.5] Learner promotion isn't persisted to v3store (bbolt) #19557

Description

What's the symptom

Which versions are impacted

What's the root cause

Why we only see this issue when upgrading from 3.5 to 3.6?

How to reproduce this issue

Manual steps

Automatic step

Proposed solution & actions

Proposal for release-3.5

Proposal for main & release-3.6.

Next step

PRs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions