🔍 Evaluation of Unlearned Models

Catastrophic Failure of LLM Unlearning via Quantization [ICLR 25] [paper]

If you find this repo to be useful, please consider cite our paper. Thank you.

    @inproceedings{
    zhang2025catastrophic,
    title={Catastrophic Failure of {LLM} Unlearning via Quantization},
    author={Zhiwei Zhang and Fali Wang and Xiaomin Li and Zongyu Wu and Xianfeng Tang and Hui Liu and Qi He and Wenpeng Yin and Suhang Wang},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=lHSeDYamnz}
    }

🛠️ Installation

Conda Environment

To create a conda environment for Python 3.10, run:

conda env create -f environment.yml
conda activate py310

📘 Data & Target Models

Two corpora News and Books and the associated target models are available as follows:

Domain	Target Model for Unlearning	Dataset
News	Target model	Dataset
Books	Target model	Dataset

Before proceeding, load all the data from HuggingFace to the root of this repostiory by running the following instruction:

python load_data.py

🚀 Run unlearning methods

To unlearn the target model using our baseline method including SURE, run unlearn.py in the baselines folder. Example scripts baselines/scripts/unlearn_news.sh and scripts/unlearn_books.sh in the baselines folder demonstrate the usage of unlearn.py. Here is an example:

algo="ga"
CORPUS="news"

python unlearn.py \
        --algo $algo \
        --model_dir $TARGET_DIR --tokenizer_dir 'meta-llama/Llama-2-7b-hf' \
        --data_file $FORGET --retain_data_file $RETAIN \
        --out_dir "./ckpt/$CORPUS/$algo" \
        --max_len $MAX_LEN --epochs $EPOCHS --lr $LR \
        --alpha 1 --threshold 90 \
        --per_device_batch_size $PER_DEVICE_BATCH_SIZE

algo: Unlearning algorithm to run (ga, ga_gdr, ga_klr, npo, npo_gdr, npo_klr, ga_gdr_sure, ga_klr_sure, npo_gdr_sure, npo_klr_sure, rmu).
model_dir: Directory of the target model.
tokenizer_dir: Directory of the tokenizer.
data_file: Forget set.
retain_data_file: Retain set for GDR/KLR regularizations if required by the algorithm.
out_dir: Directory to save the unlearned model.
max_len: Maximum input length (default: 2048).
per_device_batch_size, epochs, lr: Hyperparameters.
alpha: weight for utility constraint.
threshold: threshold to filter out salient modules (e.g. 90).

Resulting models are saved in the ckpt folder as shown:

ckpt
├── news/
│   ├── ga/
│   │   ├── checkpoint-102
│   │   ├── checkpoint-204
│   │   ├── checkpoint-306
│   │   └── ...
│   └── npo/
│       └── ...
└── books/
    ├── ga
    └── ...

🔍 Evaluation of Unlearned Models

To evaluate your unlearned model(s), run eval.py from the root of this repository with the following command-line arguments:

--model_dirs: A list of directories containing the unlearned models. These can be either HuggingFace model directories or local storage paths.
--names: A unique name assigned to each unlearned model in --model_dirs. The length of --names should match the length of --model_dirs.
--corpus: The corpus to use for evaluation. Options are news or books.
--out_file: The name of the output file. The file will be in CSV format, with each row corresponding to an unlearning method from --model_dirs, and columns representing the metrics specified by --metrics.
--tokenizer_dir (Optional): The directory of the tokenizer. Defaults to meta-llama/Llama-2-7b-hf, which is the default tokenizer for LLaMA.
--metrics (Optional): The metrics to evaluate. Options are verbmem_f (VerbMem Forget), privleak (PrivLeak), knowmem_f (KnowMem Forget), and knowmem_r (Knowmem Retain, i.e., Utility). Defaults to evaluating all these metrics.
--temp_dir (Optional): The directory for saving intermediate computations. Defaults to temp.
--quantize_4bit (Optional): whether quantize the model to 4 bit.
--quantize_8bit (Optional): whether quantize the model to 8 bit.
set quantize_4bit=0, quantize_8bit=0 to test model in full precision. quantize_4bit=1, quantize_8bit=0 to test model in 4-bit. quantize_4bit=0, quantize_8bit=1 to test model in 8-bit

Example Command

Example scripts eval.sh in the root of this repository demonstrate the usage of eval.py.

Run the following command with placeholder values:

python eval.py \
  --model_dirs "data/model1" "data/model2" \
  --names "model1" "model2" \
  --corpus books \
  --out_file "out.csv"

Acknowledgement

Our code is mainly built on muse_bench. The evaluation code on task Gen, Tru, Fac and Flu is based on RWKU. We appreciate their open-source code.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
LLama_factory		LLama_factory
__pycache__		__pycache__
baselines		baselines
data		data
metrics		metrics
temp		temp
README.md		README.md
constants.py		constants.py
environment.yml		environment.yml
eval.py		eval.py
eval.sh		eval.sh
load_data.py		load_data.py
output.csv		output.csv
requirements.txt		requirements.txt
temp.py		temp.py
utils.py		utils.py
whos_harry_potter.py		whos_harry_potter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Catastrophic Failure of LLM Unlearning via Quantization [ICLR 25] [paper]

🛠️ Installation

Conda Environment

📘 Data & Target Models

🚀 Run unlearning methods

🔍 Evaluation of Unlearned Models

Example Command

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

zzwjames/FailureLLMUnlearning

Folders and files

Latest commit

History

Repository files navigation

Catastrophic Failure of LLM Unlearning via Quantization [ICLR 25] [paper]

🛠️ Installation

Conda Environment

📘 Data & Target Models

🚀 Run unlearning methods

🔍 Evaluation of Unlearned Models

Example Command

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages