Catastrophic Failure of LLM Unlearning via Quantization [ICLR 25] [paper]
If you find this repo to be useful, please consider cite our paper. Thank you.
@inproceedings{
zhang2025catastrophic,
title={Catastrophic Failure of {LLM} Unlearning via Quantization},
author={Zhiwei Zhang and Fali Wang and Xiaomin Li and Zongyu Wu and Xianfeng Tang and Hui Liu and Qi He and Wenpeng Yin and Suhang Wang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=lHSeDYamnz}
}
To create a conda environment for Python 3.10, run:
conda env create -f environment.yml
conda activate py310Two corpora News and Books and the associated target models are available as follows:
| Domain | Target Model for Unlearning |
Dataset |
|---|---|---|
| News | Target model | Dataset |
| Books | Target model | Dataset |
Before proceeding, load all the data from HuggingFace to the root of this repostiory by running the following instruction:
python load_data.py
To unlearn the target model using our baseline method including SURE, run unlearn.py in the baselines folder. Example scripts baselines/scripts/unlearn_news.sh and scripts/unlearn_books.sh in the baselines folder demonstrate the usage of unlearn.py. Here is an example:
algo="ga"
CORPUS="news"
python unlearn.py \
--algo $algo \
--model_dir $TARGET_DIR --tokenizer_dir 'meta-llama/Llama-2-7b-hf' \
--data_file $FORGET --retain_data_file $RETAIN \
--out_dir "./ckpt/$CORPUS/$algo" \
--max_len $MAX_LEN --epochs $EPOCHS --lr $LR \
--alpha 1 --threshold 90 \
--per_device_batch_size $PER_DEVICE_BATCH_SIZEalgo: Unlearning algorithm to run (ga,ga_gdr,ga_klr,npo,npo_gdr,npo_klr,ga_gdr_sure,ga_klr_sure,npo_gdr_sure,npo_klr_sure,rmu).model_dir: Directory of the target model.tokenizer_dir: Directory of the tokenizer.data_file: Forget set.retain_data_file: Retain set for GDR/KLR regularizations if required by the algorithm.out_dir: Directory to save the unlearned model.max_len: Maximum input length (default: 2048).per_device_batch_size,epochs,lr: Hyperparameters.alpha: weight for utility constraint.threshold: threshold to filter out salient modules (e.g. 90).
Resulting models are saved in the ckpt folder as shown:
ckpt
βββ news/
β βββ ga/
β β βββ checkpoint-102
β β βββ checkpoint-204
β β βββ checkpoint-306
β β βββ ...
β βββ npo/
β βββ ...
βββ books/
βββ ga
βββ ...
To evaluate your unlearned model(s), run eval.py from the root of this repository with the following command-line arguments:
--model_dirs: A list of directories containing the unlearned models. These can be either HuggingFace model directories or local storage paths.--names: A unique name assigned to each unlearned model in--model_dirs. The length of--namesshould match the length of--model_dirs.--corpus: The corpus to use for evaluation. Options arenewsorbooks.--out_file: The name of the output file. The file will be in CSV format, with each row corresponding to an unlearning method from--model_dirs, and columns representing the metrics specified by--metrics.--tokenizer_dir(Optional): The directory of the tokenizer. Defaults tometa-llama/Llama-2-7b-hf, which is the default tokenizer for LLaMA.--metrics(Optional): The metrics to evaluate. Options areverbmem_f(VerbMem Forget),privleak(PrivLeak),knowmem_f(KnowMem Forget), andknowmem_r(Knowmem Retain, i.e., Utility). Defaults to evaluating all these metrics.--temp_dir(Optional): The directory for saving intermediate computations. Defaults totemp.--quantize_4bit(Optional): whether quantize the model to 4 bit.--quantize_8bit(Optional): whether quantize the model to 8 bit.- set
quantize_4bit=0, quantize_8bit=0to test model in full precision.quantize_4bit=1, quantize_8bit=0to test model in 4-bit.quantize_4bit=0, quantize_8bit=1to test model in 8-bit
Example scripts eval.sh in the root of this repository demonstrate the usage of eval.py.
Run the following command with placeholder values:
python eval.py \
--model_dirs "data/model1" "data/model2" \
--names "model1" "model2" \
--corpus books \
--out_file "out.csv"Our code is mainly built on muse_bench. The evaluation code on task Gen, Tru, Fac and Flu is based on RWKU. We appreciate their open-source code.