-
Notifications
You must be signed in to change notification settings - Fork 165
Improve perf of accessing dev.compute_capability
#459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/ok to test |
This comment has been minimized.
This comment has been minimized.
|
I looked here while working on the release notes. (I looked at the code before while working on #439 a couple days ago, but didn't realize then that the code is so new.) From the PR description:
I'm almost certain that there can be a race now: https://chatgpt.com/share/67d4bc2f-6c7c-8008-9d86-425fe77a3ed9 |
I don't get it. This is thread local storage, not normal Python object, why do we need any lock? |
I can try a fwd fix later tonight. Should be relatively easy.
Multiple threads can get to the for loop.
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Leo Fang ***@***.***>
Sent: Friday, March 14, 2025 5:43:18 PM
To: NVIDIA/cuda-python ***@***.***>
Cc: Ralf Grosse Kunstleve ***@***.***>; Mention ***@***.***>
Subject: Re: [NVIDIA/cuda-python] Improve perf of accessing `dev.compute_capability` (PR #459)
I don't get it. This is thread local storage, not normal Python object, why do we need any lock?
—
Reply to this email directly, view it on GitHub<#459 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAFUZAEZWURC3OQY4UDWR5D2UNZSNAVCNFSM6AAAAABXT6YSCGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMRWGA3DOMBQGU>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
[leofang]leofang left a comment (NVIDIA/cuda-python#459)<#459 (comment)>
I don't get it. This is thread local storage, not normal Python object, why do we need any lock?
—
Reply to this email directly, view it on GitHub<#459 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAFUZAEZWURC3OQY4UDWR5D2UNZSNAVCNFSM6AAAAABXT6YSCGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMRWGA3DOMBQGU>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Let us not rush into a fix and check CPython threading.local internal first. I believe the impl already has a lock so that Python level access is guaranteed thread safe. |
I only have a minute right now, hoping for a piece of information that will help me when I have a block of time later:
|
Yes, each thread has its own |
See #520: no race, but it recomputes |
Right, this is the consequence of storing data in thread-local storage. It was already the case before this PR, and this is why having a lock makes no sense. Each thread always has its own copy of |
Do we want that behavior? Or would it be better if |
Yes. For example, |
Wow, thanks! I need to read up. I completely misinterpreted what the loop is supposed to do. |
No worries! Always good to have extra pairs of eyes 🙂 |
Part of #439.
Before this PR:
With this PR:
which I consider good enough for a pure Python implementation. Compared to the CuPy counterpart (which is Cython-based and returning a string instead of namedtuple):
Note that the perf improvement of retrieving a
Device()
instance is out of scope of this PR and pending investigation (#439 (comment)).As part of this PR, I also removed a silly lock in the
Device
constructor. The data being protected is already placed in the thread-local storage, so it makes no sense to add another lock.