From the course: NVIDIA Certified Associate AI Infrastructure and Operations (NCA-AIIO) Cert Prep
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Layer 3: OS, driver, and virtualization - NVIDIA Tutorial
From the course: NVIDIA Certified Associate AI Infrastructure and Operations (NCA-AIIO) Cert Prep
Layer 3: OS, driver, and virtualization
So, hope you are getting a better understanding of all the NVIDIA software and hardware stack we have. So, we started here and we were able to discuss about the data movement, we talked about IO acceleration and we discussed about physical layer. Once this has been established, we need to ensure that we have an operating system layer or a driver layer on top of it, that is where OS driver and virtualization comes into picture. So, let's focus on this particular layer which is layer 3 where we have capabilities of running an operating system which is fine-tuned for DGX system, we will talk about GPU drivers and we will also focus on virtualization of GPUs. So this is layer 3. Let's first talk about DGX operating system. Obviously, when you have a system, it would need an OS to run and function properly. Like you have Windows and Linux based operating system which are fine-tuned for specific use cases. Then we have virtualization operating systems or hypervisors who are also designed…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
(Locked)
NVIDIA: Powering AI GPU innovation2m 37s
-
(Locked)
NVIDIA technology stack3m 12s
-
(Locked)
Layer 1: Physical layer3m 53s
-
(Locked)
GPU on a graphics card1m 57s
-
(Locked)
DGX platform2m 56s
-
(Locked)
DGX SuperPOD1m 57s
-
(Locked)
ConnectX1m 49s
-
(Locked)
BlueField DPUs2m 32s
-
(Locked)
NVIDIA reference architectures1m 38s
-
(Locked)
Understanding GPU cores5m
-
(Locked)
Comparing GPU cores4m 18s
-
(Locked)
NVIDIA DGX platform: Timeline4m 47s
-
(Locked)
DGX platform: Deployment options3m 38s
-
(Locked)
DGX A100 vs. H1004m 6s
-
(Locked)
Layer 2: Data movement and I/O acceleration59s
-
(Locked)
NVLink8m 5s
-
(Locked)
InfiniBand2m 5s
-
(Locked)
InfiniBand vs. Ethernet1m 43s
-
(Locked)
DMA and RDMA6m 30s
-
(Locked)
GPUDirect RDMA2m 44s
-
(Locked)
GPUDirect storage1m 45s
-
(Locked)
Quick comparison1m 56s
-
(Locked)
Layer 3: OS, driver, and virtualization2m 17s
-
(Locked)
GPU drivers4m 38s
-
(Locked)
GPU virtualization5m 8s
-
(Locked)
vGPU vs. MIG, part 17m 48s
-
(Locked)
vGPU vs. MIG, part 210m 59s
-
(Locked)
Layer 4: Core libraries6m 44s
-
(Locked)
Compute unified device architecture (CUDA)3m 12s
-
(Locked)
Installing CUDA2m 11s
-
(Locked)
NVIDIA collective communications library (NCCL)3m 41s
-
(Locked)
NVLink, NVSwitch, PCIe, RDMA vs. NCCL3m 44s
-
(Locked)
Layer 5: Monitoring and management2m 23s
-
(Locked)
NVIDIA-SMI4m 24s
-
(Locked)
Data Center GPU Manager (DCGM)7m 27s
-
(Locked)
Base Command Manager5m 33s
-
(Locked)
Which one to use?2m 3s
-
(Locked)
Layer 6: Applications and vertical solutions3m 48s
-
(Locked)
Summary2m 26s
-
(Locked)
NVIDIA AI Enterprise3m 2s
-
(Locked)
NVIDIA AI Factory2m 24s
-
(Locked)
-