Tutorials#

The best way to get started with NeMo is to start with one of our tutorials. They cover various domains and provide both introductory and advanced topics.

These tutorials can be run from inside the NeMo Framework Docker Container.

Large Language Models#

Data Curation#

Explore examples of data curation techniques using NeMo Curator:

Title with Link

Description

Distributed Data Classification

The notebook showcases how to use NeMo Curator with two distinct classifiers: one for evaluating data quality and another for identifying data domains. Integrating these classifiers streamlines the annotation process, enhancing the combination of diverse datasets essential for training foundational models.

PEFT Curation

The tutorial demonstrates how to use the NeMo Curator Python API to curate a dataset for Parameter Efficient Fine-Tuning (PEFT). Specifically, it uses the Enron dataset, which contains emails along with classification labels. Each email entry includes a subject, body, and category (class label). The tutorial showcases various filtering and processing operations that can be applied to each record.

Single Node Data Curation Pipeline

The notebook provides a typical data curation pipeline using NeMo Curator, with the Thai Wikipedia dataset as an example. It demonstrates how to download Wikipedia data using NeMo Curator, perform language separation with FastText, apply GPU-based exact and fuzzy deduplication, and utilize CPU-based heuristic filtering.

NeMo Curator Python API with Tinystories

The tutorial shows how to use the NeMo Curator Python API to curate the TinyStories dataset. TinyStories is a dataset of short stories generated by GPT-3.5 and GPT-4, featuring words that are understood by 3 to 4-year-olds. The small size of this dataset makes it ideal for creation and validation.

Curating Datasets for Parameter Efficient Fine-tuning (PEFT) with Synthetic Data Generation (SDG)

The tutorial demonstrates how to use the NeMo Curator Python API for data curation, as well as synthetic data generation and qualitative score assignment to prepare a dataset for PEFT of LLMs.

Custom Tokenization for Domain Adaptive Pre-Training (DAPT)

This notebook walks through the custom tokenization workflow required for DAPT, including training a customized tokenizer, dataset preprocessing, and checkpoints embedding table altering.

Training and Customization#

Title with Link

Description

Quickstart with NeMo 2.0 API

The example shows how to run a simple training loop using NeMo 2.0. It uses the train API from the NeMo Framework LLM collection.

Pre-training & PEFT Quickstart with NeMo Run

This tutorial introduces how to run any of the supported NeMo 2.0 Recipes using NeMo-Run. It also takes a pretraining and fine-tuning recipe and shows how to run it locally, as well as remotely, on a Slurm-based cluster.

Long-Context LLM Training with NeMo Run

This example demonstrates how to use NeMo 2.0 Recipes with NeMo-Run for long-context model training, as well as extending the context length of an existing pretrained model.

Llama 3 Supervised Fine-Tuning and Parameter Efficient Fine-Tuning with NeMo 2.0

This example shows how to perform Llama 3 Supervised Fine-Tuning and Parameter Efficient Fine-Tuning using SFT and LoRA notebooks with NeMo 2.0 and NeMo-Run.

Parameter Efficient Fine-Tuning with NeMo AutoModel

This example shows how to perform Parameter Efficient Fine-Tuning on Hugging Face Hub-available models with NeMo AutoModel.

Supervised Fine-Tuning with NeMo AutoModel

This example shows how to perform Supervised Fine-Tuning on Hugging Face Hub-available models with NeMo AutoModel.

NeMo SlimPajama Data Pipeline and Pretraining Tutorial

This tutorial provides step-by-step instructions for preprocessing the SlimPajama dataset and pretraining a Llama-based model using the NeMo 2.0 library.

Domain Adaptive Pre-Training (DAPT) with Llama2 7B

This tutorial demonstrates how to perform DAPT on Pre-trained models such as Llama2-7B using NeMo 2.0 Recipes with NeMo-Run.

Finetuning Llama 3.2 Model into Embedding Model

This tutorial provides a detailed walkthrough of fine-tuning a Llama 3.2 model into an embedding model using NeMo 2.0.

World Foundation Models#

Post Training#

Explore examples of post-training techniques using World Foundation Models:

Title with Link

Description

Cosmos Diffusion Models

This example shows how to post-train Cosmos Diffusion-based World Foundation Models using the NeMo Framework for your custom physical AI tasks.

Cosmos Autoregressive Models

This example shows how to post-train Cosmos Autoregressive-based World Foundation Models using the NeMo Framework for your custom physical AI tasks.

Speech AI#

Most NeMo Speech AI tutorials can be run on Google’s Colab.

Running Tutorials on Colab#

To run a tutorial:

  1. Click the Colab link associated with the tutorial you are interested in from the table below.

  2. Once in Colab, connect to an instance with a GPU by clicking Runtime > Change runtime type and selecting GPU as the hardware accelerator.

Speech AI Fundamentals#

Title

GitHub / Colab URL

Getting Started: NeMo Fundamentals

NeMo Fundamentals

Getting Started: Audio translator example

Audio translator example

Getting Started: Voice swap example

Voice swap example

Getting Started: NeMo Models

NeMo Models

Getting Started: NeMo Adapters

NeMo Adapters

Getting Started: NeMo Models on Hugging Face Hub

NeMo Models on HF Hub

Automatic Speech Recognition (ASR) Tutorials#

Title

GitHub / Colab URL

ASR with NeMo

ASR with NeMo

ASR with Subword Tokenization

ASR with Subword Tokenization

Offline ASR

Offline ASR

Online ASR Microphone Cache Aware Streaming

Online ASR Microphone Cache Aware Streaming

Online ASR Microphone Buffered Streaming

Online ASR Microphone Buffered Streaming

ASR CTC Language Fine-Tuning

ASR CTC Language Fine-Tuning

Intro to Transducers

Intro to Transducers

ASR with Transducers

ASR with Transducers

ASR with Adapters

ASR with Adapters

Speech Commands

Speech Commands

Online Offline Microphone Speech Commands

Online Offline Microphone Speech Commands

Voice Activity Detection

Voice Activity Detection

Online Offline Microphone VAD

Online Offline Microphone VAD

Speaker Recognition and Verification

Speaker Recognition and Verification

Speaker Diarization Inference

Speaker Diarization Inference

ASR with Speaker Diarization

ASR with Speaker Diarization

Online Noise Augmentation

Online Noise Augmentation

ASR for Telephony Speech

ASR for Telephony Speech

Streaming inference

Streaming inference

Buffered Transducer inference

Buffered Transducer inference

Buffered Transducer inference with LCS Merge

Buffered Transducer inference with LCS Merge

Offline ASR with VAD for CTC models

Offline ASR with VAD for CTC models

Self-supervised Pre-training for ASR

Self-supervised Pre-training for ASR

Multi-lingual ASR

Multi-lingual ASR

Hybrid ASR-TTS Models

Hybrid ASR-TTS Models

ASR Confidence Estimation

ASR Confidence Estimation

Confidence-based Ensembles

Confidence-based Ensembles

Text-to-Speech (TTS) Tutorials#

Title

GitHub / Colab URL

Basic and Advanced: NeMo TTS Primer

NeMo TTS Primer

Basic and Advanced: TTS Speech/Text Aligner Inference

TTS Speech/Text Aligner Inference

Basic and Advanced: FastPitch and MixerTTS Model Training

FastPitch and MixerTTS Model Training

Basic and Advanced: FastPitch Finetuning

FastPitch Finetuning

Basic and Advanced: FastPitch and HiFiGAN Model Training for German

FastPitch and HiFiGAN Model Training for German

Basic and Advanced: Tacotron2 Model Training

Tacotron2 Model Training

Basic and Advanced: FastPitch Duration and Pitch Control

FastPitch Duration and Pitch Control

Basic and Advanced: FastPitch Speaker Interpolation

FastPitch Speaker Interpolation

Basic and Advanced: TTS Inference and Model Selection

TTS Inference and Model Selection

Basic and Advanced: TTS Pronunciation Customization

TTS Pronunciation Customization

Tools and Utilities#

Title

GitHub / Colab URL

Utility Tools for Speech and Text: NeMo Forced Aligner

NeMo Forced Aligner

Utility Tools for Speech and Text: Speech Data Explorer

Speech Data Explorer

Utility Tools for Speech and Text: CTC Segmentation

CTC Segmentation

Text Processing (TN/ITN) Tutorials#

Title

GitHub / Colab URL

Text Normalization Techniques: Text Normalization

Text Normalization

Text Normalization Techniques: Inverse Text Normalization with Thutmose Tagger

Inverse Text Normalization with Thutmose Tagger

Text Normalization Techniques: WFST Tutorial

WFST Tutorial