Skip to content

[release-3.5] Learner promotion isn't persisted to v3store (bbolt) #19557

Closed
@ahrtr

Description

@ahrtr

See summary: https://docs.google.com/document/d/11wkyiK-bDYGOPtFfLGII7iPQxsXoYiq8f6L1fc30S4k/edit?tab=t.0#heading=h.efgyhz20f9eb

What's the symptom

When upgrading etcd from v3.5.x to v3.6.0-rc.x, the upgrade may fail due to "membership: too many learner members in cluster", because etcd allows only 1 learner at most by default.

Thanks @neolit123 for uploading the initial log.

Which versions are impacted

Note only release-3.5 has this issue. The issue was introduced in v3.5.1 in #13348. All etcd patch versions in v3.5.1 - v3.5.19 are affected.

If you ever added & promoted learner(s) in v3.5.1 - v3.5.19 and try to upgrade from 3.5.1+ to v3.6.0-rc.x, then you will see this issue,

  • If you only added & promoted one learner, you will see that the member become a learner again after upgrading to v3.6.0-rc.x;
  • if you added & promoted multiple learner (>=2), then the upgrade will fail, because the etcdserver will crash on bootstrap due to membership: too many learner members in cluster.

What's the root cause

When promoting a learner, the change is only persisted in v2store, not in v3store. The reason is simple, because etcd returns errMemberAlreadyExist. Clearly, the member (learner) ID has already existed. See 3.5 code below,

if unsafeMemberExists(tx, mkey) {
return errMemberAlreadyExist
}

So the membership data will be inconsistent between v2store and v3store in such case.

Why we only see this issue when upgrading from 3.5 to 3.6?

In 3.5, the v2store is the source of truth for the membership data. In 3.6, v3store (bbolt) is the source of truth for the membership data. When upgrading from 3.5.x to 3.6, the source of truth changes.

How to reproduce this issue

Manual steps

Note try the steps using 3.5.x (>=1) binary,

Step 1: start an etcd instance

$./bin/etcd --name e1 --initial-advertise-peer-urls http://127.0.0.1:2380 --listen-peer-urls http://127.0.0.1:2380 --advertise-client-urls http://127.0.0.1:2379 --listen-client-urls http://127.0.0.1:2379 --initial-cluster "e1=http://127.0.0.1:2380" --initial-cluster-state new

Step 2: add a learner in another terminal

$ ./bin/etcdctl member add e2 --peer-urls=http://127.0.0.1:2382 --learner

Step 3: start the learner

$ ./bin/etcd --name e2 --initial-advertise-peer-urls http://127.0.0.1:2382  --listen-peer-urls http://127.0.0.1:2382 --advertise-client-urls http://127.0.0.1:2378 --listen-client-urls http://127.0.0.1:2378 --initial-cluster "e1=http://127.0.0.1:2380,e2=http://127.0.0.1:2382" --initial-cluster-state existing

Step 4: promote the learner

$ ./bin/etcdctl member promote 155a4a14c50481b8

Step 5: stop both etcd instances

Step 6: check the bbolt db file directly

Using tool etcd-dump-db to check the db file. You will see that the already promoted leaner is still a leaner.

$ ./etcd-dump-db  iterate-bucket ../../e1.etcd/ members
key="e610623c040f129c", value="{\"id\":16577858238256452252,\"peerURLs\":[\"http://127.0.0.1:2382\"],\"isLearner\":true}"
key="b71f75320dc06a6c", value="{\"id\":13195394291058371180,\"peerURLs\":[\"http://127.0.0.1:2380\"],\"name\":\"e1\"}"

Automatic step

Just execute upgrade_test.sh (of course, you need to download both v3.5.19 and v3.6.0-rc.2 binaries beforehand), afterwards, check the log, you will see the error message "membership: too many learner members in cluster".

Proposed solution & actions

Proposal for release-3.5

  • Fix the bug as mentioned above (pasted again below) (refer to main branch's implementation). Also add an e2e test to verify the membership data is consistent between v2store and v3store. Probably we should add the verification in production code, but only enabled in test.

if unsafeMemberExists(tx, mkey) {
return errMemberAlreadyExist
}

  • Provide etcdutl commands to
    • check the membership data differences between v2store and v3store.
      • Something like etcdutl check members --data-dir path-2-data-dir
    • sync the membership data between v2store and v3store.
      • Something like etcdutl sync members --data-dir path-2-data-dir

Proposal for main & release-3.6.

Make change to main firstly, backport to release-3.6 later.

  • Add an e2e test similar to upgrade_test.sh to cover the upgrade case similar to what kubeadm does. The rough steps,

    • start a one member (3.5.x) cluster
    • add a learner (3.5.x), and promote it to a voting member later
    • add another learner (3.5.x) again, and promote it a voting member later.
    • upgrade the member to 3.6 one by one.
  • We should definitely get the issue included in the upgrade(3.5->3.6) checklist. Users should check the membership data is consistent between v3store and v2store.

  • Probably we need to publish an official announcement

cc @fuweid @ivanvc @jmhbnz @serathius @siyuanfoundation @spzala @neolit123 @dims

Next step

Let's discuss this next Monday, and who work on what.

PRs

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions