QCATune

A fine-tuning framework based on question, context, and answer relationships for enhancing legal information retrieval

📄 About the Paper

Legal document retrieval is a complex and essential task within the legal domain, requiring the extraction of relevant legal documents based on specific questions. The complexity of legal texts, along with the high level of comprehension required, poses significant challenges. These challenges are particularly pronounced in low-resource languages and specialized domains, where data scarcity and linguistic nuances impede effective retrieval.

To address these issues, we introduce a fine-tuning framework based on the relationships between questions, context, and answers (QCATune). This framework proposes two approaches: the first is fine-tuning based on question-context and question–answer relationships, and the second extends this by also incorporating answer-context relationships.

This repository accompanies our article:

A fine-tuning framework based on question, context, and answer relationships for enhancing legal information retrieval
Engineering Applications of Artificial Intelligence, 159 (2025) 111570
https://doi.org/10.1016/j.engappai.2025.111570

Directory Structure

QCATune/

├── data_preparation/

├── Raw_data/

├── Synthetic_data_generation/

├── data_rag/

│ ├── vibilaw/

│ ├── coling2020/

│ ├── zalo2021/

├── results/

│ ├── results_vibilaw/

│ ├── results_coling2020/

│ ├── results_zalo2021/

│ │ ├── <model_name>/

│ │ │ ├── epoch_1/

│ │ │ ├── ...

│ │ │ ├── epoch_5/

│ │ │ └── metrics.json

├── custom_loss.py

├── fine_tune_model_vibilaw.py

├── fine_tune_model_zalo2021.py

├── fine_tune_model_coling2020.py

└── README

Fine-Tune Model

Config for Fine-Tuning:

models: List of models to train (e.g., keepitreal/vietnamese-sbert).
alphas: List of alpha values to experiment with (e.g., [0.2, 0.3, 0.4, 0.5]).
betas: List of beta values to experiment with (e.g., [0.2, 0.3, 0.4, 0.5]).
loss_option: Set "qc-qa" for QC-QA finetuning, "qc-qa-ac" for QC-QA-AC finetuning, and "qc" for baseline fine-tuning.

Fine-Tuning Results:

The models are saved in folders with the above structure in the results folder.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Sythetic_data_generation		Sythetic_data_generation
data_preparation		data_preparation
datasize_experiments		datasize_experiments
README.md		README.md
custom_loss.py		custom_loss.py
fine_tune_model_coling2020.py		fine_tune_model_coling2020.py
fine_tune_model_dauthau.py		fine_tune_model_dauthau.py
fine_tune_model_zalo2021.py		fine_tune_model_zalo2021.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

QCATune

📄 About the Paper

Directory Structure

Fine-Tune Model

Config for Fine-Tuning:

Fine-Tuning Results:

About

Uh oh!

Releases

Packages

Languages

haipn91/QCATune

Folders and files

Latest commit

History

Repository files navigation

QCATune

📄 About the Paper

Directory Structure

Fine-Tune Model

Config for Fine-Tuning:

Fine-Tuning Results:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages