Skip to content

Conversation

@dtcccc
Copy link
Contributor

@dtcccc dtcccc commented Dec 31, 2025

Purpose

This is a separate PR for #31034 to help review.
This PR refactored sender thread using async coroutines. All related data are in the same thread so that we can drop their locks. This makes the sender thread simple and easy to maintain.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Mooncake connector's sender thread to use asyncio coroutines. This is a significant improvement that simplifies the code by removing manual lock management and leveraging modern asynchronous patterns. The state related to sending operations is now managed within a single event loop, which is a robust way to prevent race conditions. The changes are well-structured and follow best practices for integrating asyncio with threads and blocking operations. However, I found a critical issue where a silent failure can occur if a requested transfer ID is not found, potentially leading to data inconsistencies. I've provided a suggestion to fix this by raising an exception instead of returning silently.

send_meta = self.reqs_need_send.get(req_id)
if send_meta is None:
logger.warning("Request %s not found in reqs_need_send", req_id)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Returning silently when a request ID is not found can lead to silent failures. The client that initiated the transfer will receive a TRANS_DONE message, making it believe the transfer was successful, while it was actually aborted. This can lead to data inconsistencies or hangs. It's better to raise an exception to signal an error condition, which will then be caught by the _sender_worker and result in a TRANS_ERROR message being sent back to the client.

Suggested change
return
raise ValueError(f"Request {req_id} not found in reqs_need_send")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant