-
Notifications
You must be signed in to change notification settings - Fork 40
fix: Use uv Python for MCore dataset compilation (#438) #807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Use uv Python for MCore dataset compilation (#438) #807
Conversation
Changed Makefile to use 'uv run python' instead of system 'python3', ensuring the compiled extension matches the uv Python environment. Also added '-undefined dynamic_lookup' linking flag for macOS to fix 'Undefined symbols' errors during compilation. Fixes NVIDIA-NeMo#438 Signed-off-by: Yuhe Zhang <yuhe@polarr.co>
|
Thank you so much @yuhezhang-ai ! We really appreciate community contributions :) @nvidia-nemo/automation @thomasdhc can you help verify the changes? |
|
/ok to test 8d95c2c |
|
Hi, I updated branch to include the Makefile package-data fix from main. About the previous CI failures: Looking at the logs, the failures were due to: This appears to be a dependency resolution issue in the CI environment, unrelated to the Makefile changes (the compilation test itself passed). Should I wait to see if this persists after the package-data fix, or would you like me to investigate a torchvision version constraint? such as add a torchvision version pin to Thanks! |
|
Hey @yuhezhang-ai Thanks so this update. The uv run python is actually re-installing torch when it should not be and is causing this error. This is caused by some of our testing setup incorrectly mounting another copy of Automodel. I'll need to make changes to the overall test workflow. When that PR is done I will apply those changes to this PR. No further action needs to be taken from your side. Thanks! |
|
Thanks for clarifying! I appreciate you taking the time to explain the root cause. I'm interested in contributing more to the project as I learn about LLM infrastructure. Are there other issues that might be suitable for me to work on? My Background:
I can probably help with algorithm, code quality, bug fixes, kernel optimization - tasks that can be developed/verified on single-GPU. for example, I noticed #780 (sequence classification metrics bug) seems suitable for me. It's about correctness and can be tested on Colab, though it has an assignee. Happy to help with whatever you think would be suitable! 🙏 |
|
Hey @yuhezhang-ai thank you so much for your enthusiasm! It'd be great to have more hands on-board :) We usually file any open issues on the GitHub Issues tab so feel free to pick up anything interesting to you. #780 might be a good and easy one to pick up |
|
/ok to test ecde148 |
Description
Changed Makefile to use
uv run pythoninstead of systempython3, ensuring the compiled extension matches the uv Python environment.Also added
-undefined dynamic_lookuplinking flag for macOS to fix 'Undefined symbols' errors during compilation.Testing
Verified with system Python 3.11 and uv Python 3.12 - the compiled
.sofile now correctly uses the uv Python version (3.12).Fixes #438
First-time contributor here. I'm a research engineer transitioning from edge models to LLM infrastructure and algorithms. Happy to help with more tasks in the future. Thanks for reviewing!