forked from NVIDIA/MatX
-
Notifications
You must be signed in to change notification settings - Fork 1
Merging cudastf branch to main branch #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Use FetchContent to import the local rapids-cmake.
* Fix solver interfaces to use executor in cache * Add recursive mutex around cache lookup
* Added DLPack make_tensor * Add a self contained python calling MatX (calling python calling MatX) integration example --------- Co-authored-by: cliffburdick <cburdick@nvidia.com>
* Cleanup #define's in filter.cuh * Cleanup #define's in other files * Fix dereferencing type-punned pointer bug in Release mode * Fix Werror=uninitialized compile error when MATX_EN_OPENBLAS=ON in Release mode * Fix uninitialized variable bug in svd plan * Update PrintTests for default tensor name
…IA#821) This PR introduces the implementation of a single versatile sparse tensor type that uses a tensor format DSL (Domain Specific Language) to describe a vast space of storage formats. Although the tensor format can easily define many common storage formats (such as Dense, COO, CSR, CSC, BSR), it can also define many less common storage formats. In addition, the tensor format DSL can be extended to include even more storage formats in the future. This first PR simply introduces all storage details for the single versatile sparse tensor type, together with some factory methods for constructing COO, CSR, and CSC sparse matrices from MatX buffers. Later PRs will introduce more general ways of constructing sparse tensors (e.g. from file) and actual operations like SpMV and SpMM using cuSPARSE.
* Do not create CUDA events in ephemeral executors
* Update to CCCL 2.8.0 * Fixed out-of-bounds bug in cov
* initial commit with a standalone docker for production and a devcontainer
* Add configurable scaling modes for pwelch, using custom reduction kernel that performs better than CUB when in memory FFT bin powers are {batches, nfft} * Update pwelch documentation * Move nvcc-specific features behind __CUDACC__ guards and add static_asserts for signal type * Cleanup
* Start sparse tensor documentation * typo * rephrase * more formats * rewording and reformatting * small edit * missing _ in reference
* feat: added economic QR * fix memory alloc in gesvdjBatched
* Add SpMV support for matvec transformation with tests and doc * typo
* Support mixed-precision for SpMM Also fixes a few minor details related to zero-size allocation and host-side modification of device memory. * use type trait for half
* Support mixed-precision for SpMV
* Add guard for Dss support to sparse solver test * typo * proper undef case * proper undef handling
(1) finishes TODO on lvl type to include properties (2) adds more constexpr test methods and shortcuts (3) improves readability to format decls and testers
* Allow rank 4 tensors to be issued in one call to cublasLt. Added Rank 5 batched test.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Manual PR so that I can merge cudastf changes to the main branch