Skip to content

Conversation

@ShawnXuan
Copy link
Collaborator

This PR is mainly to support the NPU-compatible SortedNMS operator, so sorted_score and input_indices have been added.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for an NPU-compatible SortedNMS operator by extending both the Python-level API and underlying C++ implementations to accept sorted scores and input indices.

  • Python wrapper (nms_op) now branches for NPU to gather and pass sorted_scores and score_inds.
  • Core op registration and functor logic updated to include optional scores and input_indices, with new fused operator for NPU.
  • Data type inference and API signature modified for NPU support in nms_op.cpp and functional_api.yaml.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
python/oneflow/nn/modules/nms.py Branch NPU path to gather sorted scores and pass indices
oneflow/user/ops/nms_op.cpp InferNmsDataType now sets different dtype for NPU
oneflow/ir/include/OneFlow/OneFlowUserOps.td Added optional scores and input_indices inputs
oneflow/core/functional/impl/nn_functor.cpp Built fused_op_ and dispatch based on device and inputs
oneflow/core/functional/impl/array_functor.cpp Route argwhere through CPU for NPU
oneflow/core/functional/functional_api.yaml Updated nms signature with new optional tensor args
Comments suppressed due to low confidence (1)

oneflow/user/ops/nms_op.cpp:34

  • The CPU path sets output dtype to Int8, but NMS usually returns index tensors; Int32 or Int64 may be more appropriate.
ctx->SetOutputDType("out", 0, DataType::kInt8);
return OpInterpUtil::Dispatch<Tensor>(*op_, {x}, attrs);
DeviceType device_type = JUST(x->device())->enum_type();
if (device_type == DeviceType::kNPU) {
if (scores) {
Copy link

Copilot AI Jun 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fused NMS path checks only scores; it should verify both scores and input_indices are provided to avoid passing a null optional downstream.

Suggested change
if (scores) {
if (scores && input_indices) {
Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

github-actions bot commented Jun 1, 2025

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.


- name: "nms"
signature: "Tensor (Tensor x, Float iou_threshold, Int32 keep_n=-1) => Nms"
signature: "Tensor (Tensor x, Tensor scores=None, Tensor input_indices=None, Float iou_threshold, Int32 keep_n=-1) => Nms"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看看是否需要/方便,为npu的nms导出独立的api/functor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

4 participants