Skip to content

QCATune: A Question-Context-Answer Fine-tuning Framework for Enhancing Legal Document Retrieval

Notifications You must be signed in to change notification settings

haipn91/QCATune

Repository files navigation

QCATune

A fine-tuning framework based on question, context, and answer relationships for enhancing legal information retrieval DOI


πŸ“„ About the Paper

Legal document retrieval is a complex and essential task within the legal domain, requiring the extraction of relevant legal documents based on specific questions. The complexity of legal texts, along with the high level of comprehension required, poses significant challenges. These challenges are particularly pronounced in low-resource languages and specialized domains, where data scarcity and linguistic nuances impede effective retrieval.

To address these issues, we introduce a fine-tuning framework based on the relationships between questions, context, and answers (QCATune). This framework proposes two approaches: the first is fine-tuning based on question-context and question–answer relationships, and the second extends this by also incorporating answer-context relationships.

This repository accompanies our article:

A fine-tuning framework based on question, context, and answer relationships for enhancing legal information retrieval
Engineering Applications of Artificial Intelligence, 159 (2025) 111570
https://doi.org/10.1016/j.engappai.2025.111570

Directory Structure

QCATune/

β”œβ”€β”€ data_preparation/

β”œβ”€β”€ Raw_data/

β”œβ”€β”€ Synthetic_data_generation/

β”œβ”€β”€ data_rag/

β”‚ β”œβ”€β”€ vibilaw/

β”‚ β”œβ”€β”€ coling2020/

β”‚ β”œβ”€β”€ zalo2021/

β”œβ”€β”€ results/

β”‚ β”œβ”€β”€ results_vibilaw/

β”‚ β”œβ”€β”€ results_coling2020/

β”‚ β”œβ”€β”€ results_zalo2021/

β”‚ β”‚ β”œβ”€β”€ <model_name>/

β”‚ β”‚ β”‚ β”œβ”€β”€ epoch_1/

β”‚ β”‚ β”‚ β”œβ”€β”€ ...

β”‚ β”‚ β”‚ β”œβ”€β”€ epoch_5/

β”‚ β”‚ β”‚ └── metrics.json

β”œβ”€β”€ custom_loss.py

β”œβ”€β”€ fine_tune_model_vibilaw.py

β”œβ”€β”€ fine_tune_model_zalo2021.py

β”œβ”€β”€ fine_tune_model_coling2020.py

└── README

Fine-Tune Model

Config for Fine-Tuning:

  • models: List of models to train (e.g., keepitreal/vietnamese-sbert).
  • alphas: List of alpha values to experiment with (e.g., [0.2, 0.3, 0.4, 0.5]).
  • betas: List of beta values to experiment with (e.g., [0.2, 0.3, 0.4, 0.5]).
  • loss_option: Set "qc-qa" for QC-QA finetuning, "qc-qa-ac" for QC-QA-AC finetuning, and "qc" for baseline fine-tuning.

Fine-Tuning Results:

The models are saved in folders with the above structure in the results folder.

About

QCATune: A Question-Context-Answer Fine-tuning Framework for Enhancing Legal Document Retrieval

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published