fix: Use uv Python for MCore dataset compilation (#438) #807

yuhezhang-ai · 2025-11-16T22:44:39Z

Description

Changed Makefile to use uv run python instead of system python3, ensuring the compiled extension matches the uv Python environment.

Also added -undefined dynamic_lookup linking flag for macOS to fix 'Undefined symbols' errors during compilation.

Testing

Verified with system Python 3.11 and uv Python 3.12 - the compiled .so file now correctly uses the uv Python version (3.12).

Fixes #438

First-time contributor here. I'm a research engineer transitioning from edge models to LLM infrastructure and algorithms. Happy to help with more tasks in the future. Thanks for reviewing!

Changed Makefile to use 'uv run python' instead of system 'python3', ensuring the compiled extension matches the uv Python environment. Also added '-undefined dynamic_lookup' linking flag for macOS to fix 'Undefined symbols' errors during compilation. Fixes NVIDIA-NeMo#438 Signed-off-by: Yuhe Zhang <yuhe@polarr.co>

copy-pr-bot · 2025-11-16T22:44:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

adil-a · 2025-11-16T23:37:44Z

Thank you so much @yuhezhang-ai ! We really appreciate community contributions :)

@nvidia-nemo/automation @thomasdhc can you help verify the changes?

akoumpa · 2025-11-17T18:52:24Z

/ok to test 8d95c2c

yuhezhang-ai · 2025-11-18T17:28:16Z

Hi, I updated branch to include the Makefile package-data fix from main.

About the previous CI failures:

Looking at the logs, the failures were due to:

RuntimeError: PyTorch has CUDA Version=12.9 and torchvision has CUDA Version=13.0

This appears to be a dependency resolution issue in the CI environment, unrelated to the Makefile changes (the compilation test itself passed).

Should I wait to see if this persists after the package-data fix, or would you like me to investigate a torchvision version constraint? such as add a torchvision version pin to pyproject.toml?

Thanks!

thomasdhc · 2025-11-18T20:19:46Z

Hey @yuhezhang-ai Thanks so this update. The uv run python is actually re-installing torch when it should not be and is causing this error. This is caused by some of our testing setup incorrectly mounting another copy of Automodel. I'll need to make changes to the overall test workflow. When that PR is done I will apply those changes to this PR.

No further action needs to be taken from your side.

Thanks!

yuhezhang-ai · 2025-11-19T03:29:48Z

Thanks for clarifying! I appreciate you taking the time to explain the root cause.

I'm interested in contributing more to the project as I learn about LLM infrastructure. Are there other issues that might be suitable for me to work on?

My Background:

Computer vision research engineer with algorithm experience (and actively learning LLM/VLM)
Some Triton kernel knowledge, but limited distributed training experience
No GPU cluster access, but can test single-GPU scenarios via Colab

I can probably help with algorithm, code quality, bug fixes, kernel optimization - tasks that can be developed/verified on single-GPU.

for example, I noticed #780 (sequence classification metrics bug) seems suitable for me. It's about correctness and can be tested on Colab, though it has an assignee.

Happy to help with whatever you think would be suitable! 🙏

adil-a · 2025-11-19T04:42:45Z

Hey @yuhezhang-ai thank you so much for your enthusiasm! It'd be great to have more hands on-board :) We usually file any open issues on the GitHub Issues tab so feel free to pick up anything interesting to you. #780 might be a good and easy one to pick up

akoumpa · 2025-11-19T18:02:55Z

/ok to test ecde148

yuhezhang-ai requested review from HuiyingLi, adil-a, akoumpa and hemildesai as code owners November 16, 2025 22:44

github-actions bot added the community-request label Nov 16, 2025

copy-pr-bot bot temporarily deployed to test November 17, 2025 18:52 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 17, 2025 18:52 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 17, 2025 23:02 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 17, 2025 23:17 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci November 17, 2025 23:17 Error

copy-pr-bot bot temporarily deployed to nemo-ci November 17, 2025 23:17 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci November 17, 2025 23:17 Failure

Update Makefile

986d48e

copy-pr-bot bot temporarily deployed to nemo-ci November 18, 2025 08:10 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci November 18, 2025 08:10 Failure

Merge branch 'main' into yuhezhang/fix/438/use-uv-for-mcore-subprocess

ecde148

copy-pr-bot bot temporarily deployed to test November 19, 2025 18:03 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 19, 2025 18:03 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 19, 2025 21:45 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 19, 2025 22:07 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci November 19, 2025 22:07 Failure

copy-pr-bot bot temporarily deployed to nemo-ci November 19, 2025 22:07 Inactive

snowmanwwg added external x-pixieset labels Dec 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Use uv Python for MCore dataset compilation (#438) #807

fix: Use uv Python for MCore dataset compilation (#438) #807

Uh oh!

yuhezhang-ai commented Nov 16, 2025

copy-pr-bot bot commented Nov 16, 2025

adil-a commented Nov 16, 2025 •

edited

Loading

akoumpa commented Nov 17, 2025

yuhezhang-ai commented Nov 18, 2025

thomasdhc commented Nov 18, 2025

yuhezhang-ai commented Nov 19, 2025

adil-a commented Nov 19, 2025

akoumpa commented Nov 19, 2025

Labels

5 participants

fix: Use uv Python for MCore dataset compilation (#438) #807

Are you sure you want to change the base?

fix: Use uv Python for MCore dataset compilation (#438) #807

Uh oh!

Conversation

yuhezhang-ai commented Nov 16, 2025

Description

Testing

copy-pr-bot bot commented Nov 16, 2025

adil-a commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

akoumpa commented Nov 17, 2025

yuhezhang-ai commented Nov 18, 2025

thomasdhc commented Nov 18, 2025

yuhezhang-ai commented Nov 19, 2025

adil-a commented Nov 19, 2025

akoumpa commented Nov 19, 2025

Labels

5 participants

adil-a commented Nov 16, 2025 •

edited

Loading