Skip to content

Conversation

@bharathv
Copy link
Contributor

@bharathv bharathv commented Oct 30, 2025

Split into two parts

  • 61a04a8 - downgrades assert to an exception so it can be retried later
  • Rest of the commits reduces the likelihood of this happening.

Fixes: CORE-14308

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.2.x
  • v25.1.x
  • v24.3.x

Release Notes

  • none
This was originally an assert so we could catch things quickly during
development. A single partition running into this doesn't seem like a
reason to crash the broker. This commit changes it to a runtime
exception so the replicator can recover on a new leader and continue.

Do note that we still log an ERROR, so we are not losing any test
coverage. What was originally an assert will not be caught as a
BadLogLine.
Returns if the commit index was updated to the snapshot index
Currently it is possible that once the commit idx is updated to snapshot
idx, the callers are notified right away and they may refer to
untruncated offset translation state. In this case write_at_offset stm
was seeding a wrong last_offset based on untruncated log's offset
translation state.

This commit defers the commit index notification until after the log
truncation happened.
@bharathv bharathv marked this pull request as ready for review October 30, 2025 23:04
Copilot AI review requested due to automatic review settings October 30, 2025 23:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves handling of non-monotonic replicate requests in cluster link replication. The key change is downgrading an assertion to a recoverable exception, along with enhancements to reduce the likelihood of monotonicity violations occurring.

Key Changes:

  • Converts a vassert for monotonic offset checks into a throwable monotonicity_violation_exception that can be caught and retried
  • Refactors update_offset_from_snapshot to return a boolean indicating whether the commit index was updated, enabling proper notification propagation
  • Adds exception handling in the replicator to gracefully recover from monotonicity violations by stepping down the partition

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/v/raft/consensus.h Changed method signature to return bool instead of void
src/v/raft/consensus.cc Refactored commit index update notifications to be triggered based on return value rather than inside the update method
src/v/kafka/server/write_at_offset_stm.cc Added trace logging when applying raft snapshots
src/v/cluster_link/service.cc Replaced vassert with exception throw for monotonicity violations
src/v/cluster_link/replication/partition_replicator.h Added field to track original start offset
src/v/cluster_link/replication/partition_replicator.cc Enhanced exception handling and reset logic to use stored start offset
src/v/cluster_link/replication/deps.h Defined new exception class for monotonicity violations
Copy link
Contributor

@michael-redpanda michael-redpanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in cluster_link/replication make sense ot me - leaving the raft stuff for @mmaslankaprv

@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#75373
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
DataMigrationsApiTest test_creating_and_listing_migrations null integration https://buildkite.com/redpanda/redpanda/builds/75373#019a3798-3528-4ead-b54e-566af0d514eb FLAKY 16/21 upstream reliability is '96.35666347075743'. current run reliability is '76.19047619047619'. drift is 20.16619 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DataMigrationsApiTest&test_method=test_creating_and_listing_migrations
MountUnmountIcebergTest test_simple_remount {"cloud_storage_type": 1} integration https://buildkite.com/redpanda/redpanda/builds/75373#019a3798-3528-4ead-b54e-566af0d514eb FLAKY 17/21 upstream reliability is '77.25225225225225'. current run reliability is '80.95238095238095'. drift is -3.70013 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=MountUnmountIcebergTest&test_method=test_simple_remount
NodesDecommissioningTest test_decommissioning_finishes_after_manual_cancellation {"cloud_topic": true, "delete_topic": true} integration https://buildkite.com/redpanda/redpanda/builds/75373#019a3798-3529-4434-9972-6952f1a1a5ca FLAKY 20/21 upstream reliability is '98.81656804733728'. current run reliability is '95.23809523809523'. drift is 3.57847 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodesDecommissioningTest&test_method=test_decommissioning_finishes_after_manual_cancellation
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": false} integration https://buildkite.com/redpanda/redpanda/builds/75373#019a3798-3529-4434-9972-6952f1a1a5ca FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
co_return;
}
update_offsets_from_snapshot(metadata.value());
auto commit_idx_updated = update_offsets_from_snapshot(metadata.value());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am wondering if it would be possible to do this as a last step in hydrate snapshot ?
Currently it seems that the race still may happen. If a check in stm manager verifying if offset is ready to be applied is executed before the co_await _configuration_manager.add scheduling point, for that the notification is not required.

Copy link
Contributor Author

@bharathv bharathv Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct, the race is still theoretically possible.

i am wondering if it would be possible to do this as a last step in hydrate snapshot ?

This was my first thought but the thing I was unsure about is if we do this we move the raft logical start offset after log physical offset (in truncate_to_latest_snapshot()) has any other repercussions. There could be a reader that sees a logical offset, creates a reader in the truncated portion of the log, any implications of that?

@bharathv bharathv requested a review from mmaslankaprv October 31, 2025 16:23
@dotnwat dotnwat changed the title cl: better handing of non monotonic replicate requests Nov 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

4 participants