Add `ObjectCode` ptx constructor #470

brandon-b-miller · 2025-02-25T16:28:45Z

Adds the ability to link multiple PTX files with one linker instance.

copy-pr-bot · 2025-02-25T16:28:50Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cuda_core/cuda/core/experimental/_module.py

cuda_core/tests/test_module.py

cuda_core/cuda/core/experimental/_module.py

ksimpson-work

I think the title of the review should be changed to something like : "Add ObjectCode constructors" or something like that. Linking multiple PTX object codes is already supported.

I personally don't think we should allow people to be constructing ptx object codes, but if we are going to do so, I agree with Leo that we need to be very explicit in the documentation of the constructor, that it is not the main path and should only be used when necessary.

Thanks for contributing!

cuda_core/tests/test_module.py

cuda_core/tests/test_linker.py

cuda_core/cuda/core/experimental/_module.py

rwgk

Drive-by comments; very minor; I was mostly curious.

cuda_core/tests/test_module.py

brandon-b-miller · 2025-02-27T13:34:22Z

On CUDA 12, in a configuration where the driver is behind the runtime, I'm encountering an MVC error at test_object_code_load_ptx.

cuda.core.experimental._utils.CUDAError: CUDA_ERROR_UNSUPPORTED_PTX_VERSION: the provided PTX was compiled with an unsupported toolchain.

I'm guessing this has something to do with not invoking nvjitlink where we otherwise should be, I'm investigating.

ksimpson-work · 2025-02-27T17:41:57Z

On CUDA 12, in a configuration where the driver is behind the runtime, I'm encountering an MVC error at test_object_code_load_ptx.
cuda.core.experimental._utils.CUDAError: CUDA_ERROR_UNSUPPORTED_PTX_VERSION: the provided PTX was compiled with an unsupported toolchain.
I'm guessing this has something to do with not invoking nvjitlink where we otherwise should be, I'm investigating.

As far as we know, this is expected behaviour. This is an error raised by the driver. We raise a warning from Program.Compile() if you compile ptx in an enironment where the driver is behind the runtime. Did you get this warning?

cuda_core/cuda/core/experimental/_module.py

leofang · 2025-02-27T17:52:57Z

/ok to test

github-actions · 2025-02-27T18:11:31Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-470/
https://nvidia.github.io/cuda-python/pr-preview/pr-470/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-470/cuda-bindings/
Preview will be ready when the GitHub Pages deployment is complete.

leofang · 2025-02-27T19:14:39Z

As far as we know, this is expected behaviour. This is an error raised by the driver. We raise a warning from Program.Compile() if you compile ptx in an enironment where the driver is behind the runtime. Did you get this warning?

Keenan is right and this error is reproduced in the CI (with CTK 12.8 & driver 12.4). @brandon-b-miller we should catch the warning

RuntimeWarning: The CUDA driver version is older than the backend version. The generated ptx will not be loadable by the current driver.

as noted by Keenan and skip the test_object_code_load_ptx test when a warning is raised. In this case nvJitLink is not involved; in fact, if it were then we'd get a valid PTX to load.

brandon-b-miller · 2025-02-28T13:53:10Z

What do you think of the changes in 9276d11? It seems we dodge the warning when constructing an ObjectCode directly since we avoid the Program.compile codepath.

leofang · 2025-02-28T13:57:08Z

/ok to test

kkraus14 · 2025-02-28T14:47:25Z

Should we explore integrating and providing an interface to nvptxcompiler as opposed to using the driver APIs?

leofang · 2025-02-28T14:47:53Z

It seems we dodge the warning when constructing an ObjectCode directly since we avoid the Program.compile codepath.

Right, the warning is raised in the fixture which is passed as a test function argument. On top of my head I don't know if there's a way to capture such warnings and do a conditional skip. This change looks fine to me.

leofang · 2025-02-28T14:56:20Z

Should we explore integrating and providing an interface to nvptxcompiler as opposed to using the driver APIs?

@kkraus14 we discussed that before and decided to avoid that because nvptxcompiler does not provide a shared library, only a static one which would cause huge binary size bloat and a similar painful deployment challenge as with pynvjitlink for a niche use case. nvJitLink can link multiple PTXs together into a cubin, which is what's already enabled in Linker and tested in this PR. From what I can tell this should meet all of numba-cuda needs.

kkraus14 · 2025-02-28T15:22:00Z

Should we explore integrating and providing an interface to nvptxcompiler as opposed to using the driver APIs?

nvJitLink can link multiple PTXs together into a cubin, which is what's already enabled in Linker and tested in this PR. From what I can tell this should meet all of numba-cuda needs.

IIRC the motivating use case for nvptxcompiler is that it solves the problem of using a newer CTK with an older driver with regards to PTX versioning. I believe it would address #470 (comment) for example.

@kkraus14 we discussed that before and decided to avoid that because nvptxcompiler does not provide a shared library, only a static one which would cause huge binary size bloat and a similar painful deployment challenge as with pynvjitlink for a niche use case.

I would argue this is becoming less and less of a niche use case as we package CTK better and better for Python users. Regarding bloat, in the fullness of time we should probably create a Pythonic interface / bindings to it and package it as a separate library that could be optionally used by cuda.core or something similar to that.

leofang · 2025-02-28T16:16:00Z

IIRC the motivating use case for nvptxcompiler is that it solves the problem of using a newer CTK with an older driver with regards to PTX versioning. I believe it would address #470 (comment) for example.

I think that example is really a corner case that we want to discourage. If one has multiple PTX and/or LTO-IR to link together, and we want the final output to be loadable by the driver (not just for humans to inspect/debug the output), then the output should be a CUBIN not PTX. That example is not following the recommended practice as per the CUDA Compatibility doc.

nvJitLink is designed for exactly this use case. An old driver cannot load a PTX generated by a newer toolchain, but it has no problem loading CUBIN generated by a newer toolchain, and nvJitLink can generate a CUBIN as the final output.

In fact, cuda.core uses nvJitLink for multiple purposes due to this exceptional capability:

JIT-compiling a single PTX to CUBIN (this happens in Program)
Linking either multiple PTX to CUBIN or multiple LTO-IR to CUBIN/PTX (this happens in Linker)

This allows us to have a full solution for CUDA minor version compatibility.

Regarding bloat, in the fullness of time we should probably create a Pythonic interface / bindings to it and package it as a separate library that could be optionally used by cuda.core or something similar to that.

At one point I would like to cover bindings for all CTK components (except for math libs, which now lives in nvmath.bindings), so it sounds like a reasonable ask and I created #478 to track it. However, I really do not see a use case for it now that nvJitLink solves all problems and is much more friendly to maintain, package, and deploy.

kkraus14 · 2025-02-28T22:33:12Z

nvJitLink is designed for exactly this use case. An old driver cannot load a PTX generated by a newer toolchain, but it has no problem loading CUBIN generated by a newer toolchain, and nvJitLink can generate a CUBIN as the final output.

Based on the documentation it looks like nvJitLink uses nvptxcompiler based on needing to include it if static linking, so this makes sense. I wasn't aware of this capability. Thanks!

leofang · 2025-03-01T15:02:00Z

Yeah I feel we should document some of these discussions. Perhaps as a Q&A or tips & tricks emphasizing CUDA minor version compat support.

leofang · 2025-03-01T15:02:38Z

Thanks @brandon-b-miller @ksimpson-work for driving this PR across finish line!

leofang · 2025-03-03T02:16:28Z

Yeah I feel we should document some of these discussions.

See #480.

brandon-b-miller added 5 commits February 25, 2025 08:03

initial

ea8c13f

clean

2755598

add linker tests

b1d33fc

remove extra fixture

c3cb0ed

docstring fix

0b067ce

leofang reviewed Feb 25, 2025

View reviewed changes

cuda_core/cuda/core/experimental/_module.py Outdated Show resolved Hide resolved

cuda_core/tests/test_module.py Show resolved Hide resolved

cuda_core/cuda/core/experimental/_module.py Outdated Show resolved Hide resolved

leofang assigned brandon-b-miller Feb 25, 2025

leofang requested a review from ksimpson-work February 25, 2025 17:32

leofang added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Feb 25, 2025

leofang added this to the cuda.core beta 3 milestone Feb 25, 2025

ksimpson-work reviewed Feb 25, 2025

View reviewed changes

cuda_core/tests/test_module.py Show resolved Hide resolved

cuda_core/tests/test_linker.py Show resolved Hide resolved

cuda_core/cuda/core/experimental/_module.py Outdated Show resolved Hide resolved

ksimpson-work previously approved these changes Feb 26, 2025

View reviewed changes

rwgk reviewed Feb 26, 2025

View reviewed changes

cuda_core/tests/test_module.py Outdated Show resolved Hide resolved

cuda_core/tests/test_module.py Outdated Show resolved Hide resolved

address reviews

4e03a6d

brandon-b-miller dismissed ksimpson-work’s stale review via 4e03a6d February 27, 2025 13:28

minor fix

8bdf61c

brandon-b-miller changed the title ~~Support linking multiple ptx files~~ Feb 27, 2025

leofang reviewed Feb 27, 2025

View reviewed changes

cuda_core/cuda/core/experimental/_module.py Outdated Show resolved Hide resolved

fix typo

9f2f437

leofang previously approved these changes Feb 27, 2025

View reviewed changes

brandon-b-miller dismissed leofang’s stale review via 9276d11 February 28, 2025 13:51

skip if ptx too new

9276d11

Merge branch 'main' into objectcode-from-ptx

d11f075

leofang approved these changes Feb 28, 2025

View reviewed changes

leofang merged commit 3d413ed into NVIDIA:main Mar 1, 2025
74 checks passed

brandon-b-miller deleted the objectcode-from-ptx branch March 1, 2025 15:05

leofang mentioned this pull request Mar 2, 2025

Document best practices for supporting CUDA minor version compatibility #480

Open

brandon-b-miller mentioned this pull request May 22, 2025

Add more ObjectCode constructors #652

Merged

leofang mentioned this pull request May 22, 2025

[RFE] Better support for CUDA Minor Version Compatibility NVIDIA/numba-cuda#262

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `ObjectCode` ptx constructor #470

Add `ObjectCode` ptx constructor #470

Uh oh!

brandon-b-miller commented Feb 25, 2025

copy-pr-bot bot commented Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

ksimpson-work left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

rwgk left a comment

Uh oh!

Uh oh!

brandon-b-miller commented Feb 27, 2025

ksimpson-work commented Feb 27, 2025

Uh oh!

leofang commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

Preview will be ready when the GitHub Pages deployment is complete.

leofang commented Feb 27, 2025

brandon-b-miller commented Feb 28, 2025

leofang commented Feb 28, 2025

kkraus14 commented Feb 28, 2025

leofang commented Feb 28, 2025

leofang commented Feb 28, 2025 •

edited

Loading

kkraus14 commented Feb 28, 2025

leofang commented Feb 28, 2025 •

edited

Loading

kkraus14 commented Feb 28, 2025

leofang commented Mar 1, 2025

Uh oh!

leofang commented Mar 1, 2025

leofang commented Mar 3, 2025

Add ObjectCode ptx constructor #470

Add ObjectCode ptx constructor #470

Uh oh!

Conversation

brandon-b-miller commented Feb 25, 2025

copy-pr-bot bot commented Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

ksimpson-work left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brandon-b-miller commented Feb 27, 2025

ksimpson-work commented Feb 27, 2025

Uh oh!

leofang commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

Preview will be ready when the GitHub Pages deployment is complete.

leofang commented Feb 27, 2025

brandon-b-miller commented Feb 28, 2025

leofang commented Feb 28, 2025

kkraus14 commented Feb 28, 2025

leofang commented Feb 28, 2025

leofang commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

kkraus14 commented Feb 28, 2025

leofang commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

kkraus14 commented Feb 28, 2025

leofang commented Mar 1, 2025

Uh oh!

leofang commented Mar 1, 2025

leofang commented Mar 3, 2025

Add `ObjectCode` ptx constructor #470

Add `ObjectCode` ptx constructor #470

ksimpson-work left a comment •

edited

Loading

leofang commented Feb 28, 2025 •

edited

Loading

leofang commented Feb 28, 2025 •

edited

Loading