Skip to content

Conversation

@wagner-austin
Copy link

@wagner-austin wagner-austin commented Dec 30, 2025

Summary

Change _is_pyarrow_table() return type from bool to TypeGuard[pa_Table] to enable mypy type narrowing at all call sites. This follows the existing pattern used by _is_pyarrow_array().

Mypy Error (before fix)

python-package/lightgbm/basic.py:2110: error: Item "str" of "str | Path | ndarray[tuple[Any, ...], dtype[Any]] | Any | Sequence | list[Sequence] | list[ndarray[tuple[Any, ...], dtype[Any]]]" has no attribute "column_names"  [union-attr]
python-package/lightgbm/basic.py:2110: error: Item "Path" of "str | Path | ndarray[tuple[Any, ...], dtype[Any]] | Any | Sequence | list[Sequence] | list[ndarray[tuple[Any, ...], dtype[Any]]]" has no attribute "column_names"  [union-attr]
python-package/lightgbm/basic.py:2110: error: Item "ndarray[tuple[Any, ...], dtype[Any]]" of "str | Path | ndarray[tuple[Any, ...], dtype[Any]] | Any | Sequence | list[Sequence] | list[ndarray[tuple[Any, ...], dtype[Any]]]" has no attribute "column_names"  [union-attr]
python-package/lightgbm/basic.py:2110: error: Item "Sequence" of "str | Path | ndarray[tuple[Any, ...], dtype[Any]] | Any | Sequence | list[Sequence] | list[ndarray[tuple[Any, ...], dtype[Any]]]" has no attribute "column_names"  [union-attr]
python-package/lightgbm/basic.py:2110: error: Item "list[Sequence]" of "str | Path | ndarray[tuple[Any, ...], dtype[Any]] | Any | Sequence | list[Sequence] | list[ndarray[tuple[Any, ...], dtype[Any]]]" has no attribute "column_names"  [union-attr]
python-package/lightgbm/basic.py:2110: error: Item "list[ndarray[tuple[Any, ...], dtype[Any]]]" of "str | Path | ndarray[tuple[Any, ...], dtype[Any]] | Any | Sequence | list[Sequence] | list[ndarray[tuple[Any, ...], dtype[Any]]]" has no attribute "column_names"  [union-attr]

Related Issue

Fixes: #3867

Test Plan

  • Ran pre-commit run --files python-package/lightgbm/basic.py - all checks passed
  • Verified mypy no longer reports union-attr errors for line 2110
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your interest in LightGBM and for taking this on!

For PRs like this, including the exact error text from mypy is helpful. It helps other contributors know specifically what you're trying to fix, and helps others searching GitHub for similar errors to find your solution.

I guess in this case it's all of these?

python-package/lightgbm/basic.py:2110: error: Item "str" of "str | Path | ndarray[tuple[Any, ...], dtype[Any]] | Any | Sequence | list[Sequence] | list[ndarray[tuple[Any, ...], dtype[Any]]]" has no attribute "column_names"  [union-attr]
python-package/lightgbm/basic.py:2110: error: Item "Path" of "str | Path | ndarray[tuple[Any, ...], dtype[Any]] | Any | Sequence | list[Sequence] | list[ndarray[tuple[Any, ...], dtype[Any]]]" has no attribute "column_names"  [union-attr]
python-package/lightgbm/basic.py:2110: error: Item "ndarray[tuple[Any, ...], dtype[Any]]" of "str | Path | ndarray[tuple[Any, ...], dtype[Any]] | Any | Sequence | list[Sequence] | list[ndarray[tuple[Any, ...], dtype[Any]]]" has no attribute "column_names"  [union-attr]
python-package/lightgbm/basic.py:2110: error: Item "Sequence" of "str | Path | ndarray[tuple[Any, ...], dtype[Any]] | Any | Sequence | list[Sequence] | list[ndarray[tuple[Any, ...], dtype[Any]]]" has no attribute "column_names"  [union-attr]
python-package/lightgbm/basic.py:2110: error: Item "list[Sequence]" of "str | Path | ndarray[tuple[Any, ...], dtype[Any]] | Any | Sequence | list[Sequence] | list[ndarray[tuple[Any, ...], dtype[Any]]]" has no attribute "column_names"  [union-attr]
python-package/lightgbm/basic.py:2110: error: Item "list[ndarray[tuple[Any, ...], dtype[Any]]]" of "str | Path | ndarray[tuple[Any, ...], dtype[Any]] | Any | Sequence | list[Sequence] | list[ndarray[tuple[Any, ...], dtype[Any]]]" has no attribute "column_names"  [union-attr]

If so, awesome find! I can see how mypy wouldn't be able to understand this function call, because it's only returning bool:

def _is_pyarrow_table(data: Any) -> bool:

But there are several calls to that function and you're only changing 1 here.

git grep '_is_pyarrow_table'

Instead, could you please:

  • revert this change
  • update the return hint of _is_pyarrow_table to "TypeGuard[pa_Table]"

And see if that resolves it?

https://mypy.readthedocs.io/en/latest/type_narrowing.html describes TypeGuard if you haven't seen it before.

@jameslamb
Copy link
Collaborator

For PRs like this, including the exact error text from mypy is helpful.

@wagner-austin I see you've opened more PRs with type-hinting fixes. Much appreciated, and it's great to have them split up in small contributions!

Before we review any more of those, please do this (including the exact text from mypy in the description) for all of them.

@jameslamb
Copy link
Collaborator

jameslamb commented Dec 30, 2025

Oh and can you please also link all of them to the mypy tracking issue as well?

#3867

… mypy type narrowing

Change _is_pyarrow_table() return type from bool to TypeGuard[pa_Table] to enable mypy type narrowing at all call sites. This follows the existing pattern used by _is_pyarrow_array() and allows mypy to recognize data.column_names as valid when extracting feature names from PyArrow tables.

Fixes: microsoft#3867
@wagner-austin wagner-austin force-pushed the fix-pyarrow-table-type-narrowing branch from 4c20333 to f83c38e Compare December 30, 2025 05:27
@wagner-austin wagner-austin changed the title [python-package] Replace _is_pyarrow_table() with isinstance() for type narrowing Dec 30, 2025
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the investigation and changes, looks great!

Confirmed in the CI logs that this change eliminates those errors and gets us down to just 12 mypy errors remaining 🎉

Found 12 errors in 4 files (checked 9 source files)

(build link)

Don't worry if the R-package / rchck CI job fails... that's a known issue unrelated to any changes you make in the Python package (#7113).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants