2,908 questions
0
votes
0
answers
22
views
ONNX Script rewriter: how to match patterns with multiple outputs?
I am trying to implement this rewrite rule from the TASO paper with ONNX Script rewriter. However, I cannot figure out how to implement a pattern with multiple outputs X and Y.
The ONNX Script does ...
2
votes
1
answer
75
views
Having trouble with R's torch and tensor dimensions
I am trying to follow along with this webpage: https://jtr13.github.io/cc21fall2/tutorial-on-r-torch-package.html
I am trying to understand R's implementation of PyTorch.
I am having some trouble with ...
1
vote
2
answers
126
views
pytorch Module B=A, A.to('cpu'), but the tensor in B is still in GPU, why?
After converting module A to CPU, the origin parameter tensor still stays on the GPU? When it is released? Is it wrong if I reuse the parameter?
My code:
import torch.nn as nn
class A(nn.Module):
...
2
votes
1
answer
25
views
PyTorch .view() operation to manipulate tensor dimensions vis a vis using torch.unbind followed by torch.cat
In Torch, .view() reshapes the tensor. However, there are multiple ways to reshape a multi-dimensional tensor to a target shape. How does it decide between those different ways?
For example, in Torch, ...
0
votes
0
answers
25
views
Change contravariant tensor to covariant tensor in einsteinpy package
Using einsteinpy package of Python, I am defining the electromagnetic tensor (or any other arbitrary tensor). While defining, I am defining it as 'uu' tensor using the BaseRelativityTensor class file. ...
0
votes
0
answers
66
views
Decomposition of a large matrix in CP format (sum of products of matrices)
I have a matrix A of size n^2 by n^2, and I wanted to know if for a given accuracy (or a number r) there is a way to express A as the sum of Bi kron Ci for i=1...R
where Bi, Ci are n by n? i.e.
...
1
vote
1
answer
316
views
How are fp6 and fp4 supported on NVIDIA Tensor Core on Blackwell?
I am writing PTX assembly code on CUDA C++ for research. This is my setup:
I have just downloaded the latest CUDA C++ toolkit (13.0) yesterday on WSL linux.
The local compilation environment does not ...
3
votes
1
answer
66
views
Learnable parameter only updating non-zero values and identically during training [closed]
I have the following code in Python3.11 using PyTorch:
arr = np.array([[0, 0, 0],
[0, -1, 0],
[0, 1, 0]])
arr_tensor = torch.tensor(arr, dtype=torch.float32, device=...
0
votes
1
answer
36
views
RuntimeError when trying to run suno/bark-small on GPU
When I run:
from transformers import AutoProcessor, BarkModel
import os
from scipy.io.wavfile import write as write_wav
CUDA_VISIBLE_DEVICES=0
os.environ["SUNO_OFFLOAD_CPU"] = "True&...
1
vote
0
answers
106
views
Rewriting a n-dimensional matrix of dot products as a matrix multiplication
This is a crosspost from the Math Exchange forum, it seems to me that this question can be approached in two different ways so I am curious about different approaches.
https://math.stackexchange.com/...
1
vote
0
answers
41
views
Difference between tokens generated on a configuration in two different contexts
I have a model that given a configuration, or state (of a Rubik's cube, but whatever, it is a sequence of integers) generates a movement (from 0 to 5). This movement can be used to bring the ...
1
vote
1
answer
63
views
Matrix Multiply with Vector and Tensor in Python
I have a Vector, M, with size N and a Tensor, d, with size NxNxD.
My aim is to perform the matrix multication M*d[i,:,:] for each i to get a new matrix with size nxD.
Now I could just do it like this:
...
0
votes
1
answer
38
views
why TensorDataset devide the data to minibatches?
Why TensorDataset devide the data to minibatches? For example, when putting in it 2D array, instead of yielding 2D tensors as batches, it sets the required batches to be minibatches, and its actual &...
0
votes
1
answer
343
views
Distinction CuTe and NVIDIA Cutlass
I'm confused what exactly is handled by CuTe and by Cutlass.
From my understanding Cutlass handles the following:
Gemm computation of CuTe Tensors
Communication between CPU and GPU
Abstract memory ...
1
vote
0
answers
43
views
How to optimize CPU tensor slicing and asynchronous transfer to the GPU?
My code involves slicing large tensors on the CPU by index and asynchronously transmitting them back to the GPU. However, through the Profiler debugging tool, I found that this step would seriously ...