NVIDIA collective communications library (NCCL) - NVIDIA Tutorial

From the course: NVIDIA Certified Associate AI Infrastructure and Operations (NCA-AIIO) Cert Prep

Start my 1-month free trial Buy for my team

NVIDIA collective communications library (NCCL)

“

Let's talk about NVIDIA collective communications library. There are scenarios in your in model training or model inferences that multiple GPUs may have to communicate with each other. Now these GPU may exist in the same system or they may be outside of that system. So should they use NVLink, should they use NVSwitch, should they use RDMA? their communication should happen between them. So rather than we taking care of all this programmatically we rely on this library which would take care of all communication for your GPUs. So let's focus on NVIDIA collective communications library NCCL. So NCCL is a multi GPU communication library which provide abstraction and optimization pattern for communication between many GPUs. Rather than I initiating a connection and then closing the connection and then communicating through different mode of transport, I would let this NCCL use the underlying architecture and then do optimized communication on that. So here we have host 1. Host 1 GPUs can…

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

NVIDIA collective communications library (NCCL) - NVIDIA Tutorial

From the course: NVIDIA Certified Associate AI Infrastructure and Operations (NCA-AIIO) Cert Prep

NVIDIA collective communications library (NCCL)

Practice while you learn with exercise files

Download courses and learn on the go

Contents

Start learning today.

Explore Business Topics

Explore Creative Topics

Explore Technology Topics