You may need this software in the following cases.
-
Manage memory allocation by yourself. Sometimes, you are irritated by the framework's memory allocation mechanism. They use a complicated caching-based allocator and generate fragments.
-
Unified framework-agnostic memory management operations.
-
Customized Communication Pattern. Using PyTorch, it is impossible to implement GPU P2P communication, since nccl backend only supports collective communication APIs. Now, you can implement it with help of CUDA-level libraries.
mkdir build && cd build && cmake .. && make
pip install `find . -name "*whl"`
See PyTorch Example and TensorFlow Example for details. More features are Working In Progress.