From the course: NVIDIA Certified Associate AI Infrastructure and Operations (NCA-AIIO) Cert Prep
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
NVIDIA collective communications library (NCCL) - NVIDIA Tutorial
From the course: NVIDIA Certified Associate AI Infrastructure and Operations (NCA-AIIO) Cert Prep
NVIDIA collective communications library (NCCL)
Let's talk about NVIDIA collective communications library. There are scenarios in your in model training or model inferences that multiple GPUs may have to communicate with each other. Now these GPU may exist in the same system or they may be outside of that system. So should they use NVLink, should they use NVSwitch, should they use RDMA? their communication should happen between them. So rather than we taking care of all this programmatically we rely on this library which would take care of all communication for your GPUs. So let's focus on NVIDIA collective communications library NCCL. So NCCL is a multi GPU communication library which provide abstraction and optimization pattern for communication between many GPUs. Rather than I initiating a connection and then closing the connection and then communicating through different mode of transport, I would let this NCCL use the underlying architecture and then do optimized communication on that. So here we have host 1. Host 1 GPUs can…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
(Locked)
NVIDIA: Powering AI GPU innovation2m 37s
-
(Locked)
NVIDIA technology stack3m 12s
-
(Locked)
Layer 1: Physical layer3m 53s
-
(Locked)
GPU on a graphics card1m 57s
-
(Locked)
DGX platform2m 56s
-
(Locked)
DGX SuperPOD1m 57s
-
(Locked)
ConnectX1m 49s
-
(Locked)
BlueField DPUs2m 32s
-
(Locked)
NVIDIA reference architectures1m 38s
-
(Locked)
Understanding GPU cores5m
-
(Locked)
Comparing GPU cores4m 18s
-
(Locked)
NVIDIA DGX platform: Timeline4m 47s
-
(Locked)
DGX platform: Deployment options3m 38s
-
(Locked)
DGX A100 vs. H1004m 6s
-
(Locked)
Layer 2: Data movement and I/O acceleration59s
-
(Locked)
NVLink8m 5s
-
(Locked)
InfiniBand2m 5s
-
(Locked)
InfiniBand vs. Ethernet1m 43s
-
(Locked)
DMA and RDMA6m 30s
-
(Locked)
GPUDirect RDMA2m 44s
-
(Locked)
GPUDirect storage1m 45s
-
(Locked)
Quick comparison1m 56s
-
(Locked)
Layer 3: OS, driver, and virtualization2m 17s
-
(Locked)
GPU drivers4m 38s
-
(Locked)
GPU virtualization5m 8s
-
(Locked)
vGPU vs. MIG, part 17m 48s
-
(Locked)
vGPU vs. MIG, part 210m 59s
-
(Locked)
Layer 4: Core libraries6m 44s
-
(Locked)
Compute unified device architecture (CUDA)3m 12s
-
(Locked)
Installing CUDA2m 11s
-
(Locked)
NVIDIA collective communications library (NCCL)3m 41s
-
(Locked)
NVLink, NVSwitch, PCIe, RDMA vs. NCCL3m 44s
-
(Locked)
Layer 5: Monitoring and management2m 23s
-
(Locked)
NVIDIA-SMI4m 24s
-
(Locked)
Data Center GPU Manager (DCGM)7m 27s
-
(Locked)
Base Command Manager5m 33s
-
(Locked)
Which one to use?2m 3s
-
(Locked)
Layer 6: Applications and vertical solutions3m 48s
-
(Locked)
Summary2m 26s
-
(Locked)
NVIDIA AI Enterprise3m 2s
-
(Locked)
NVIDIA AI Factory2m 24s
-
(Locked)
-