-
Notifications
You must be signed in to change notification settings - Fork 7k
[WIP][Fix] Fix code review #59034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: add_resource_based_autoscaler
Are you sure you want to change the base?
[WIP][Fix] Fix code review #59034
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request primarily focuses on translating comments and docstrings from Chinese to English across several files, improving code readability and maintainability. It also introduces a ResourceBasedActorAutoscaler and a NodeAwareActorRemovalStrategy, along with corresponding tests. The changes are positive, especially the replacement of magic numbers with constants.
However, I've identified a few issues, mainly in the test files. There are two critical bugs: one is an incorrect comment indentation that will cause a syntax error, and the other is a typo in a variable name that will lead to a NameError. I've also pointed out some minor issues related to redundant or misplaced comments. Please address these points to ensure the tests are correct and the code quality is maintained.
| class SlowUDF: | ||
| def __init__(self): | ||
| # This UDF slows down actor creation, and thus | ||
| # This UDF slows down actor creation, and thus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # 创建两个并发执行的dataset | ||
| # Verify data integrity instead of the number of batches | ||
| total_rows2 = sum(len(batch["id"]) for batch in results2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ) | ||
|
|
||
| # 直接创建测试实例,不调用 setup_class() | ||
| """Test that the node-aware strategy does not remove active actors.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assert selected in [actor3, actor4, actor5] | ||
|
|
||
| # 验证统计逻辑 | ||
| """Test the logic for counting idle actors per node.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| """测试按权重分配资源给多个 actor pools""" | ||
| # 创建 mock operator 和 actor pools | ||
| """Test distribute resources by weight among multiple actor pools""" | ||
| # mock operator 和 actor pools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # 创建两个并发执行的dataset | ||
| # Verify data integrity instead of the number of batches | ||
| total_rows2 = sum(len(batch["id"]) for batch in results2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
069c1fa to
94f678a
Compare
| ) | ||
|
|
||
| # 直接创建测试实例,不调��� setup_class() | ||
| """Test that the node-aware strategy does not remove active actors.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Duplicate docstring used instead of comment
A string literal duplicating the function's docstring appears as a standalone statement on line 1023, where a comment should be. Looking at the diff, the original Chinese comment # 直接创建测试实例,不调用 setup_class() was mistakenly replaced with the function's docstring """Test that the node-aware strategy does not remove active actors.""" instead of the appropriate English comment. This string literal is a no-op and appears to be a translation error.
8051811 to
4340aef
Compare
f20b9d1 to
8e6f0e0
Compare
8e6f0e0 to
1381960
Compare
| test_case._wait_for_actor_ready(pool, ready_ref) | ||
|
|
||
| assert pool.current_size() == 6 | ||
| assert pool.current_size() == 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Test assertion changed breaks test logic consistency
The assertion at line 1098 was changed from pool.current_size() == 6 to pool.current_size() == 4, but the subsequent assertion at line 1105 still expects pool.current_size() == 3 after shrinking by 3 actors. This creates an arithmetic inconsistency: if the pool has 4 actors and shrinks by 3, the result should be 1 actor, not 3. The comment on line 1091 also still says "Expand to 6 actors" which contradicts the new assertion. The test will fail at runtime due to this inconsistency.
Additional Locations (1)
| min_resources=ExecutionResources(cpu=3, gpu=0, memory=3e9), | ||
| max_resources=ExecutionResources(cpu=6, gpu=0, memory=6e9) | ||
| min_resources=ExecutionResources(cpu=1, gpu=0, memory=3e9), | ||
| max_resources=ExecutionResources(cpu=6, gpu=0, memory=6e9), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Test for resource contention has incomplete commented-out code
Test case 3 in test_e2e_resource_based_autoscaling claims to test "resource contention" between two datasets, but several lines are commented out: ds4._execute_to_iterator(), shared_executor4 = ds4._current_executor, and the assertion for shared_executor4. The loop at line 678 only iterates over [shared_executor3] instead of both executors, so ds4 never has resource limits applied even though the comment says "Set the same resource limits for both executors". The test still runs both datasets concurrently, but only ds3 has resource limits, making the resource contention test incomplete.
Description
Related issues
Additional information