What are Diffusion Models?

Last Updated : 22 Aug, 2025

Diffusion models are a type of generative AI that create new data like images, audio or even video by starting with random noise and gradually turning it into something meaningful. They work by simulating a diffusion process where data is slowly corrupted by noise during training and then learning to reverse this process step by step. By doing so the model learns how to generate high quality samples from scratch.

Table of Content

Understanding Diffusion Models

Diffusion models are generative models that learn to reverse a diffusion process to generate data. The diffusion process involves gradually adding noise to data until it becomes pure noise.
Through this process a simple distribution is transformed into a complex data distribution in a series of small incremental steps.
Essentially these models operate as a reverse diffusion phenomenon where noise is introduced to the data in a forward manner and removed in a reverse manner to generate new data samples.
By learning to reverse this process diffusion models start from noise and gradually denoise it to produce data that closely resembles the training examples.

Key Components

Forward Diffusion Process: This process involves adding noise to the data in a series of small steps. Each step slightly increases the noise, making the data progressively more random until it resembles pure noise.
Reverse Diffusion Process: The model learns to reverse the noise-adding steps. Starting from pure noise, the model iteratively removes the noise, generating data that matches the training distribution.
Score Function: This function estimates the gradient of the data distribution concerning the noise. It helps guide the reverse diffusion process to produce realistic samples.

Architecture of Diffusion Models

The architecture of diffusion models typically involves two main components:

Forward Diffusion Process
Reverse Diffusion Process

1. Forward Diffusion Process

In this process noise is incrementally added to the data over a series of steps. This is akin to a Markov chain where each step slightly degrades the data by adding Gaussian noise.

Mathematically, this can be represented as:

q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{\alpha_t} x_{t-1}, (1 - \alpha_t)I)

where,

x_t is the noisy data at step t
\alpha_t controls the amount of noise added.

2. Reverse Diffusion Process

The reverse process aims to reconstruct the original data by denoising the noisy data in a series of steps reversing the forward diffusion.

This is typically modelled using a neural network that predicts the noise added at each step:

p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \sigma_\theta(x_t, t))

where,

\mu_\theta and \sigma_\theta are learned parameters.

Working Principle of Diffusion Models

During training the model learns to predict the noise added at each step of the forward process. This is done by minimizing a loss function that measures the difference between the predicted and actual noise.

Forward Process (Diffusion)

The forward process involves gradually corrupting the data x_0 with Gaussian noise over a sequence of time steps
Let x_t represent the noisy data at time step t. The process is defined as:

x_t = \sqrt{1 - \beta_t} x_{t-1} + \sqrt{\beta_t} \epsilon

where \beta_t is the noise schedule that controls the amount of noise added at each step and \epsilon is is Gaussian noise.
As t increases, x_t becomes more noisy until it approximates a Gaussian distribution.

Reverse Process (Denoising)

The reverse process aims to reconstruct the original data x_0 from the noisy data x_T at the final time step T.
This process is modelled using a neural network to approximate the conditional probability p_\theta(x_{t-1} | x_t).
The reverse process can be formulated as:

x_{t-1} = \frac{1}{\sqrt{1 - \beta_t}} \left( x_t - \frac{\beta_t}{\sqrt{1 - \beta_t}} \epsilon_\theta(x_t, t) \right)

where \epsilon_\theta is a neural network parameterized by \theta that predicts the noise.

Training Diffusion Models

The training objective for diffusion models involves minimizing the difference between the true noise \epsilon added in the forward process and the noise predicted by the neural network \epsilon_\theta.
The score function which estimates the gradient of the data distribution concerning the noise plays an important role in guiding the reverse process.
The loss function is typically the mean squared error (MSE) between these two quantities:

L(\theta) = \mathbb{E}_{x_0, \epsilon, t} \left[ \| \epsilon - \epsilon_\theta(x_t, t) \|^2 \right]

This encourages the model to accurately predict the noise and, consequently, to denoise effectively during the reverse process.

Implementation

Step 1: Install Necessary Libraries

Python

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torchvision import transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np

Step 2: Beta Schedule and Noise Schedule

Defines how much noise is added at each time step. It increases linearly over time starting with a small value and ending with a larger one.

Python

def linear_beta_schedule(timesteps):
    beta_start = 0.0001
    beta_end = 0.02
    return torch.linspace(beta_start, beta_end, timesteps)

T = 200
betas = linear_beta_schedule(T)

alphas = 1. - betas
alphas_cumprod = torch.cumprod(alphas, axis=0)

Step 3: Forward Diffusion Process

Gradually adds Gaussian noise to the original image across many steps. The image becomes more and more noisy with each step.

Python

def forward_diffusion_sample(x_0, t, noise=None):
    """
    Add noise to the image x_0 at timestep t
    """
    if noise is None:
        noise = torch.randn_like(x_0)
    sqrt_alphas_cumprod = torch.sqrt(alphas_cumprod[t])[:, None, None, None]
    sqrt_one_minus_alphas_cumprod = torch.sqrt(1 - alphas_cumprod[t])[:, None, None, None]
    return sqrt_alphas_cumprod * x_0 + sqrt_one_minus_alphas_cumprod * noise, noise

Step 4: Neural Network (U Net or simple CNN)

A simple convolutional neural network that takes the noisy image and the time step and learns to predict the noise that was added.

Python

class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Conv2d(1, 32, 3, padding=1), nn.ReLU(),
            nn.Conv2d(32, 32, 3, padding=1), nn.ReLU(),
            nn.Conv2d(32, 1, 3, padding=1),
        )

    def forward(self, x, t):
        return self.net(x)

Step 5: Training Loop

The model is trained using many noisy images to minimize the difference between the actual and predicted noise (mean squared error).

Python

def get_data():
    transform = transforms.Compose([
        transforms.ToTensor(), 
        lambda x: x * 2 - 1 
    ])
    dataset = MNIST(root="./data", train=True, download=True, transform=transform)
    dataloader = DataLoader(dataset, batch_size=128, shuffle=True)
    return dataloader

def train(model, dataloader, optimizer, epochs=5):
    for epoch in range(epochs):
        for step, (x, _) in enumerate(dataloader):
            x = x.to(device)
            t = torch.randint(0, T, (x.shape[0],), device=device).long()
            x_noisy, noise = forward_diffusion_sample(x, t)
            noise_pred = model(x_noisy, t)
            loss = F.mse_loss(noise_pred, noise)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        print(f"Epoch {epoch}: Loss {loss.item():.4f}")

Step 6: Sampling (Reverse Process)

Starts from pure noise and uses the model to remove noise step by step generating a clean image at the end.

Python

@torch.no_grad()
def sample(model, image_size, num_samples):
    model.eval()
    x = torch.randn((num_samples, 1, image_size, image_size), device=device)
    for t in reversed(range(T)):
        t_tensor = torch.full((num_samples,), t, device=device, dtype=torch.long)
        pred_noise = model(x, t_tensor)
        alpha = alphas[t]
        alpha_bar = alphas_cumprod[t]
        beta = betas[t]
        if t > 0:
            noise = torch.randn_like(x)
        else:
            noise = 0
        x = (1 / torch.sqrt(alpha)) * (
            x - (beta / torch.sqrt(1 - alpha_bar)) * pred_noise
        ) + torch.sqrt(beta) * noise
    return x

Step 7: Running the Model

It sets up the device initializes the model and optimizer, loads the data and starts training. After training it generates 16 sample images from noise and displays them in a 4×4 grid.

Python

device = "cuda" if torch.cuda.is_available() else "cpu"
model = SimpleModel().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
dataloader = get_data()

train(model, dataloader, optimizer)

samples = sample(model, 28, 16)
grid = torchvision.utils.make_grid(samples.cpu(), nrow=4, normalize=True)
plt.imshow(grid.permute(1, 2, 0))
plt.axis("off")
plt.show()

Output:

You can download the source code from here

Applications

Image Generation: Diffusion models are widely used to generate realistic images from random noise. This application is especially popular in fields like art, gaming, advertising and graphic design where high quality visuals are essential.
Image Editing and Inpainting: They enable advanced editing by filling in missing or damaged parts of an image. This is useful in photo restoration, object removal or editing specific regions without affecting the whole image.
Text to Image Generation: By converting written prompts into images, diffusion models allow creators to bring their ideas to life visually. This is used in storytelling, concept design, marketing and more.
Super Resolution: Diffusion models can improve the quality of low resolution images by enhancing details. This application benefits medical imaging, satellite photos and surveillance footage.

Advantages

Flexibility: They can model complex data distributions without requiring explicit likelihood estimation.
High Quality Generation: Diffusion models generate high quality samples often surpassing other generative models like GANs.
Stable Training: Unlike GANs diffusion models avoid issues like mode collapse and unstable training dynamics.
Theoretical Foundations: Based on well understood principles from stochastic processes and statistical mechanics.

Disadvantages

Slow Sampling: Generating samples can be slow because of the many steps needed for the reverse diffusion process.
Complexity: The architecture and training process can be complex making them challenging to implement and understand.
Memory Usage: High memory consumption during training due to the need to store multiple intermediate steps.
Fine Tuning: Requires careful tuning of noise schedules and other hyperparameters to achieve optimal performance.

Anonymous

Improve

Article Tags :

What are Diffusion Models?

Understanding Diffusion Models

Key Components

Architecture of Diffusion Models

1. Forward Diffusion Process

2. Reverse Diffusion Process

Working Principle of Diffusion Models

Forward Process (Diffusion)

Reverse Process (Denoising)

Training Diffusion Models

Implementation

Step 1: Install Necessary Libraries

Step 2: Beta Schedule and Noise Schedule

Step 3: Forward Diffusion Process

Step 4: Neural Network (U Net or simple CNN)

Step 5: Training Loop

Step 6: Sampling (Reverse Process)

Step 7: Running the Model

Applications

Advantages

Disadvantages

Explore

Introduction to AI

AI Concepts

Machine Learning in AI

Robotics and AI

Generative AI

AI Practice

Thank You!

What kind of Experience do you want to share?