Resource-Efficient Large Language Model Distillation

This repository contains the code, models, and experiments for our Master's thesis titled "Distilling Intelligence: Leveraging Knowledge Distillation for Improved Resource Efficiency of Large Language Models in Business Applications". The research focuses on making LLMs more sustainable through knowledge distillation while maintaining strong performance on financial analysis tasks. The full thesis is available upon request.

Project Overview

This project investigates hard-label knowledge distillation as a technique to improve the resource efficiency of Large Language Models (LLMs) while preserving acceptable performance levels across varying analytical tasks. The research employs a quantitative, empirical approach to evaluate distilled models on three increasingly complex financial NLP tasks:

Sentiment Analysis - Classifying financial phrases as positive, negative, or neutral
Text Classification - Categorizing commodity market news headlines into multiple financial dimensions
Summarization - Condensing lengthy earnings call transcripts into concise bullet points

Through these experiments, we empirically validate whether smaller, distilled models can achieve similar performance to larger teacher models while consuming significantly fewer compute resources, energy, and costs.

Relevant Parts of the Repository Structure

.
├── scripts/              # Scripts for running experiments
│   ├── run_inference.py  # Run inference with models
│   ├── run_training.py   # Train distilled models
│   └── load_datasets.py  # Load and prepare datasets
├── src/                  # Main source code
│   ├── data/             # Data processing utilities
│   ├── evaluation/       # Evaluation metrics and procedures
│   ├── models/           # Model loading and interaction
│   ├── prompts/          # Task-specific prompts 
│   └── utils/            # Utility functions
├── init.sh               # Environment initialization script
└── README.md             # This file

Getting Started

Prerequisites

Python 3.11+
CUDA-capable GPU for optimal performance (especially for larger models)
Git

Environment Setup

Clone the repository

git clone https://github.com/hendrik-spl/cbs-thesis-efficient-llm-distillation.git
cd cbs-thesis-efficient-llm-distillation

Create environment file

cp .env.example .env
# Edit .env to add your API keys (WANDB_API_KEY, HF_TOKEN)

Initialize the environment

source init.sh

This script handles:

Installing dependencies via uv
Setting up environment variables
Installing Ollama (if not already present)
Starting the Ollama server

Set up Weights & Biases

wandb login
# Enter your API key when prompted

Running Experiments

Inference

Run inference using a pre-trained model on a specific dataset:

uv run scripts/run_inference.py --model_name llama3.2:1b --dataset sentiment

Parameters:

--model_name: Model to use (e.g., llama3.2:1b, llama3.3:70b)
--dataset: Dataset to run inference on (sentiment, gold, summary)
--limit: Number of samples to process
--run_on_test: Whether to run on test set (default: False)
--use_ollama: Whether to use Ollama (default: False, uses HF)

Training (Knowledge Distillation)

Train a student model through knowledge distillation:

uv run scripts/run_training.py --student_model llama3.2:1b --teacher_model llama3.2:1b --dataset sentiment

Or using inference outputs:

uv run scripts/run_training.py --student_model llama3.2:1b --teacher_model llama3.3:70b --dataset sentiment:50agree --inference_title noble-sun-21

Parameters:

--student_model: The model to be distilled (e.g., llama3.2:1b)
--teacher_model: The source model used for distillation (e.g., llama3.3:70b)
--dataset: Dataset to use for distillation
--inference_title: Title of the inference run to use as teacher outputs

Key Findings

Our experiments demonstrate that:

Distilled student models consistently outperform equivalently sized raw models, validating knowledge distillation's effectiveness.
Smaller student models can achieve up to 99% of teacher model performance while reducing energy consumption by up to 99%.
For complex tasks like summarization, the accuracy gap between teacher and student models widens, but remains acceptable for many applications.
The initial investment in distillation is typically offset after a few thousand inference queries, with the break-even point achieved more quickly for token-intensive tasks.
Distilled models offer dramatic improvements in inference speed, making them suitable for latency-sensitive applications.

Repository Link

Access to repository: https://github.com/hendrik-spl/sustainable-llm-knowledge-distillation

Name		Name	Last commit message	Last commit date
Latest commit History 262 Commits
eda		eda
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
init.sh		init.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Resource-Efficient Large Language Model Distillation

Project Overview

Relevant Parts of the Repository Structure

Getting Started

Prerequisites

Environment Setup

Running Experiments

Inference

Training (Knowledge Distillation)

Key Findings

Repository Link

About

Uh oh!

Uh oh!

Languages

hendrik-spl/sustainable-llm-knowledge-distillation

Folders and files

Latest commit

History

Repository files navigation

Resource-Efficient Large Language Model Distillation

Project Overview

Relevant Parts of the Repository Structure

Getting Started

Prerequisites

Environment Setup

Running Experiments

Inference

Training (Knowledge Distillation)

Key Findings

Repository Link

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages