Merging cudastf branch to main branch #2

sidelnik · 2025-03-19T20:57:18Z

Manual PR so that I can merge cudastf changes to the main branch

* Use FetchContent to import the local rapids-cmake.

* Fix solver interfaces to use executor in cache * Add recursive mutex around cache lookup

* Added DLPack make_tensor * Add a self contained python calling MatX (calling python calling MatX) integration example --------- Co-authored-by: cliffburdick <cburdick@nvidia.com>

* Cleanup #define's in filter.cuh * Cleanup #define's in other files * Fix dereferencing type-punned pointer bug in Release mode * Fix Werror=uninitialized compile error when MATX_EN_OPENBLAS=ON in Release mode * Fix uninitialized variable bug in svd plan * Update PrintTests for default tensor name

…st. (NVIDIA#829)

…IA#821) This PR introduces the implementation of a single versatile sparse tensor type that uses a tensor format DSL (Domain Specific Language) to describe a vast space of storage formats. Although the tensor format can easily define many common storage formats (such as Dense, COO, CSR, CSC, BSR), it can also define many less common storage formats. In addition, the tensor format DSL can be extended to include even more storage formats in the future. This first PR simply introduces all storage details for the single versatile sparse tensor type, together with some factory methods for constructing COO, CSR, and CSC sparse matrices from MatX buffers. Later PRs will introduce more general ways of constructing sparse tensors (e.g. from file) and actual operations like SpMV and SpMM using cuSPARSE.

* Do not create CUDA events in ephemeral executors

* Update to CCCL 2.8.0 * Fixed out-of-bounds bug in cov

* initial commit with a standalone docker for production and a devcontainer

* Add configurable scaling modes for pwelch, using custom reduction kernel that performs better than CUB when in memory FFT bin powers are {batches, nfft} * Update pwelch documentation * Move nvcc-specific features behind __CUDACC__ guards and add static_asserts for signal type * Cleanup

* Start sparse tensor documentation * typo * rephrase * more formats * rewording and reformatting * small edit * missing _ in reference

) * add references to sparse section * remove _ from label reference

* feat: added economic QR * fix memory alloc in gesvdjBatched

* Add SpMV support for matvec transformation with tests and doc * typo

* Support mixed-precision for SpMM Also fixes a few minor details related to zero-size allocation and host-side modification of device memory. * use type trait for half

* Support mixed-precision for SpMV

* Add guard for Dss support to sparse solver test * typo * proper undef case * proper undef handling

…VIDIA#918)

(1) finishes TODO on lvl type to include properties (2) adds more constexpr test methods and shortcuts (3) improves readability to format decls and testers

* Allow rank 4 tensors to be issued in one call to cublasLt. Added Rank 5 batched test.

cliffburdick and others added 30 commits November 5, 2024 08:28

Updating developer documentation (NVIDIA#793)

aa7befe

Modify concat op to enable concatenating float3. (NVIDIA#792)

d5a34e7

Remove old comment about sort limitations

3bc989c

Typo in argmin/max index

04570ac

Fix rapids cmake (NVIDIA#799)

d05ffe9

* Use FetchContent to import the local rapids-cmake.

Switched to getRs instead of getRi for faster inverse (NVIDIA#797)

5bec034

Remove warning about pybind missing

1677603

Support half precision R2C transforms (NVIDIA#796)

6576984

Fix gcc13 erroneous warning (NVIDIA#802)

5771339

fixed missing forwarding code for allocate (NVIDIA#804)

ab409ce

Fix bug with eye, and also zero workspace before LU factorization (NV…

b3869e0

…IDIA#807)

change shape_type for the remap op (NVIDIA#806)

b5fe2da

Faster batched SVD for small sizes (NVIDIA#805)

c33d749

Fixing broadcasting in all operator() (NVIDIA#795)

3368402

Add a better error on memory allocation failure (NVIDIA#808)

0e5c634

Fix solver interfaces to use executor in cache (NVIDIA#809)

76623d6

* Fix solver interfaces to use executor in cache * Add recursive mutex around cache lookup

Python integration sample (NVIDIA#812)

4445bc2

* Added DLPack make_tensor * Add a self contained python calling MatX (calling python calling MatX) integration example --------- Co-authored-by: cliffburdick <cburdick@nvidia.com>

Fixes for clang17 errors/warnings (NVIDIA#815)

b1a02f1

frexp_fix (NVIDIA#817)

0897097

Fix EndOffset self_type

1bb2f9c

Adding structures needed for sparse support (NVIDIA#819)

f76060c

fix missing newline at EOF (to avoid future diff issues) (NVIDIA#822)

97bb925

add size() to container storage (NVIDIA#824)

b7474c1

minor edit for sparse (layout and proper swap def) (NVIDIA#820)

7e290d6

add a to-string method for memory space (NVIDIA#823)

8a013cc

Cleanup cmake usage when MatX is a dependent project (NVIDIA#827)

39ba917

Fixing warnings issues by clang-19, both host and device (NVIDIA#825)

d71f0dd

Update build_docs actions to newest. Add CI_RUN_DATETIME in version.r…

af3573b

…st. (NVIDIA#829)

cliffburdick and others added 27 commits February 26, 2025 11:01

Added CUDA executor alias (NVIDIA#891)

84dfc1b

Do not create CUDA events in ephemeral executors (NVIDIA#889)

efbb2a0

* Do not create CUDA events in ephemeral executors

Update to CCCL 2.8.0 (NVIDIA#895)

a0fc788

* Update to CCCL 2.8.0 * Fixed out-of-bounds bug in cov

MatX Containers (NVIDIA#892)

106632e

* initial commit with a standalone docker for production and a devcontainer

Fix initialization order of stream/profiling (NVIDIA#893)

505dbca

Start sparse tensor documentation (NVIDIA#898)

dc2bb4d

* Start sparse tensor documentation * typo * rephrase * more formats * rewording and reformatting * small edit * missing _ in reference

add various references to sparse tensor api, extend type doc (NVIDIA#901

5ee372a

) * add references to sparse section * remove _ from label reference

feat: added economic QR (NVIDIA#903)

1e5c64a

* feat: added economic QR * fix memory alloc in gesvdjBatched

Add SpMV support for matvec transformation (NVIDIA#904)

245b036

* Add SpMV support for matvec transformation with tests and doc * typo

Support mixed-precision for SpMM (NVIDIA#906)

4445f72

* Support mixed-precision for SpMM Also fixes a few minor details related to zero-size allocation and host-side modification of device memory. * use type trait for half

Refine print order and skip device contents (NVIDIA#908)

6b20e04

Support mixed-precision for SpMV (NVIDIA#907)

a3fbd5d

* Support mixed-precision for SpMV

Add a SpMM (COO) benchmark (all types) (NVIDIA#909)

207dfc7

Enabled mixed precision tests for SpMM and SpMV (NVIDIA#910)

6d38754

fixed sparse tensor print format (NVIDIA#913)

650050d

Add guard for Dss support to sparse solver test (NVIDIA#915)

08e8f56

* Add guard for Dss support to sparse solver test * typo * proper undef case * proper undef handling

minor code cleanup (NVIDIA#916)

af98475

guard half type usage with proper cuda capability (NVIDIA#917)

9207c50

Implemented sparse2sparse transformation (COO to CSR using cuSPARSE) (N…

9b2e882

…VIDIA#918)

minor code cleanup (NVIDIA#919)

7b7d863

Add COO to CSR test (NVIDIA#921)

ed6de5a

Fixed bug in sort() where memory was not properly freed (NVIDIA#922)

6bda214

Improve tensor format types (NVIDIA#923)

ef5da23

(1) finishes TODO on lvl type to include properties (2) adds more constexpr test methods and shortcuts (3) improves readability to format decls and testers

Change rank limit for batching (NVIDIA#911)

9990d68

* Allow rank 4 tensors to be issued in one call to cublasLt. Added Rank 5 batched test.

Add std::complex to dlpack converter (NVIDIA#924)

b29f12c

Merge branch 'cudastf' into cudastf_latest

a1efd1c

sidelnik marked this pull request as ready for review March 19, 2025 22:19

sidelnik merged commit 6437eab into cudastf Mar 19, 2025

sidelnik deleted the cudastf_latest branch March 19, 2025 22:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merging cudastf branch to main branch #2

Merging cudastf branch to main branch #2

Uh oh!

sidelnik commented Mar 19, 2025

Merging cudastf branch to main branch #2

Merging cudastf branch to main branch #2

Uh oh!

Conversation

sidelnik commented Mar 19, 2025