Skip to content

Merging cudastf branch to main branch #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 114 commits into from
Mar 19, 2025
Merged

Merging cudastf branch to main branch #2

merged 114 commits into from
Mar 19, 2025

Conversation

sidelnik
Copy link
Owner

Manual PR so that I can merge cudastf changes to the main branch

cliffburdick and others added 30 commits November 5, 2024 08:28
* Use FetchContent to import the local rapids-cmake.
* Fix solver interfaces to use executor in cache
* Add recursive mutex around cache lookup
* Added DLPack make_tensor

* Add a self contained python calling MatX (calling python calling MatX) integration example

---------

Co-authored-by: cliffburdick <cburdick@nvidia.com>
* Cleanup #define's in filter.cuh

* Cleanup #define's in other files

* Fix dereferencing type-punned pointer bug in Release mode

* Fix Werror=uninitialized compile error when MATX_EN_OPENBLAS=ON in Release mode

* Fix uninitialized variable bug in svd plan

* Update PrintTests for default tensor name
…IA#821)

This PR introduces the implementation of a single versatile sparse tensor type that uses a tensor format DSL (Domain Specific Language) to describe a vast space of storage formats. Although the tensor format can easily define many common storage formats (such as Dense, COO, CSR, CSC, BSR), it can also define many less common storage formats. In addition, the tensor format DSL can be extended to include even more storage formats in the future.

This first PR simply introduces all storage details for the single versatile sparse tensor type, together with some factory methods for constructing COO, CSR, and CSC sparse matrices from MatX buffers. Later PRs will introduce more general ways of constructing sparse tensors (e.g. from file) and actual operations like SpMV and SpMM using cuSPARSE.
cliffburdick and others added 27 commits February 26, 2025 11:01
* Do not create CUDA events in ephemeral executors
* Update to CCCL 2.8.0

* Fixed out-of-bounds bug in cov
* initial commit with a standalone docker for production and a devcontainer
* Add configurable scaling modes for pwelch, using custom reduction kernel that performs better than CUB when in memory FFT bin powers are {batches, nfft}

* Update pwelch documentation

* Move nvcc-specific features behind __CUDACC__ guards and add
static_asserts for signal type

* Cleanup
* Start sparse tensor documentation

* typo

* rephrase

* more formats

* rewording and reformatting

* small edit

* missing _ in reference
)

* add references to sparse section

* remove _ from label reference
* feat: added economic QR

* fix memory alloc in gesvdjBatched
* Add SpMV support for matvec transformation

with tests and doc

* typo
* Support mixed-precision for SpMM

Also fixes a few minor details related to zero-size allocation
and host-side modification of device memory.

* use type trait for half
* Support mixed-precision for SpMV
* Add guard for Dss support to sparse solver test

* typo

* proper undef case

* proper undef handling
(1) finishes TODO on lvl type to include properties
(2) adds more constexpr test methods and shortcuts
(3) improves readability to format decls and testers
* Allow rank 4 tensors to be issued in one call to cublasLt. Added Rank 5
batched test.
@sidelnik sidelnik marked this pull request as ready for review March 19, 2025 22:19
@sidelnik sidelnik merged commit 6437eab into cudastf Mar 19, 2025
@sidelnik sidelnik deleted the cudastf_latest branch March 19, 2025 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
9 participants