Skip to content

ucsb-mlsec/PatchPilot

Repository files navigation

πŸ› οΈ PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework

πŸ” Overview | πŸ› οΈ Installation | πŸš€ Quick Start | πŸ“ Citation

News

  • πŸŽ‰ [May 2025] PatchPilot accepted at ICML 2025!
  • πŸš€ [May 2025] PatchPilot code are now open-sourced!
  • πŸš€ [February 2025] PatchPilot achieves superior performance on bench while maintaining low cost (< $1 per instance)!
  • πŸ“„ [February 2025] PatchPilot paper is available on arXiv!

Overview

πŸ› οΈ PatchPilot: Balancing Efficacy, Stability, and Cost-Efficiency

PatchPilot is an innovative rule-based planning patching tool that strikes the excellent balance between patching efficacy, stability, and cost-efficiency.

Key Innovations:

  • 🎯 Five-Component Workflow: Reproduction, Localization, Generation, Validation, and Refinement
  • πŸ’° Cost-Efficient: Less than $1 per instance while maintaining high performance
  • πŸ”’ High Stability: More stable than agent-based planning methods
  • ⚑ Superior Performance: Outperforms existing open-source methods on SWE-bench

πŸ—οΈ Architecture Overview

PatchPilot's workflow consists of five specialized components:

  1. πŸ”„ Reproduction: Reproduce the reported bug to understand the issue
  2. πŸ” Localization: Identify problematic code locations with multi-level analysis
  3. ⚑ Generation: Generate high-quality patch candidates
  4. πŸ›‘οΈ Validation: Validate patches through comprehensive testing
  5. ✨ Refinement: Unique refinement step to improve patch quality

Installation

🐳 Docker Setup (Recommended)

  1. Pull the Docker image:
docker pull 3rdn4/patchpilot_verified:v1
  1. Run the container with Docker-in-Docker support:
docker run --privileged -v /var/run/docker.sock:/var/run/docker.sock -it 3rdn4/patchpilot_verified:v1

Note: --privileged -v /var/run/docker.sock:/var/run/docker.sock is required for Docker-in-Docker functionality used by SWE-bench.

  1. Set up the environment inside the container:
cd /opt
git clone git@github.com:ucsb-mlsec/PatchPilot.git
cd PatchPilot
conda activate patchpilot
export PYTHONPATH=$PYTHONPATH:$(pwd)
  1. Configure API keys:
# For Anthropic Claude
export ANTHROPIC_API_KEY=your_anthropic_key_here

# OR for OpenAI
export OPENAI_API_KEY=your_openai_key_here

Quick Start

πŸ”„ 1. Reproduction

First, reproduce the bugs to understand the issues:

python patchpilot/reproduce/reproduce.py \
    --reproduce_folder results/reproduce \
    --num_threads 50 \
    --setup_map setup_result/verified_setup_map.json \
    --tasks_map setup_result/verified_tasks_map.json \
    --task_list_file swe_verify_tasks.txt

πŸ” 2. Localization

Step 1: Multi-Level Localization

python patchpilot/fl/localize.py \
    --file_level \
    --direct_line_level \
    --output_folder results/localization \
    --top_n 5 \
    --compress \
    --context_window=20 \
    --temperature 0.7 \
    --match_partial_paths \
    --reproduce_folder results/reproduce \
    --task_list_file swe_verify_tasks.txt \
    --num_samples 4 \
    --num_threads 16 \
    --benchmark verified

Step 2: Merge Localization Results

python patchpilot/fl/localize.py \
    --merge \
    --output_folder results/localization/merged \
    --start_file results/localization/loc_outputs.jsonl \
    --num_samples 4

⚑ 3. Repair and Validation

Generate patches with integrated validation:

python patchpilot/repair/repair.py \
    --loc_file results/localization/merged/loc_all_merged_outputs.jsonl \
    --output_folder results/repair \
    --loc_interval \
    --top_n=5 \
    --context_window=20 \
    --max_samples 12 \
    --batch_size 4 \
    --benchmark verified \
    --reproduce_folder results/reproduce \
    --verify_folder results/verify \
    --setup_map setup_result/verified_setup_map.json \
    --tasks_map setup_result/verified_tasks_map.json \
    --num_threads 16 \
    --task_list_file swe_verify_tasks.txt \
    --refine_mod \
    --benchmark verified

Note: Functionality tests are retrieved through useful_scripts/generate_functest.py and do not use the pass_to_pass approach.

πŸ“Š 4. Evaluation

Run SWE-bench evaluation on the generated patches:

cd /opt/orig_swebench/SWE-bench
conda activate swe_bench

python -m swebench.harness.run_evaluation \
    --predictions_path [path_to_best_patches_round_2.jsonl] \
    --max_workers 16 \
    --run_id [experiment_name]

Configuration Parameters

Parameter Description
--max_samples Total number of patch samples to generate per instance
--batch_size Number of samples generated per batch (early stopping if validation passes)
--num_threads Number of parallel processing threads
--task_list_file File containing instances to be fixed
--loc_file Output file from the localization step
--backend Model backend (claude, openai, etc.)
--model Specific model version
--loc_interval Provide multiple context intervals vs. min-max range only
--top_n Number of files to consider as context
--context_window Lines of context around localized code
--refine_mod Enable PatchPilot's unique refinement component

πŸ”„ Resuming Interrupted Experiments

If an experiment is interrupted, simply rerun the same command - PatchPilot will resume from where it left off. For different experiments, clean the folders or use different output directories.

πŸ“ Citation

If you find PatchPilot useful in your research, please cite our paper:

@article{li2025patchpilot,
  title={PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework},
  author={Li, Hongwei and Tang, Yuheng and Wang, Shiqi and Guo, Wenbo},
  journal={arXiv preprint arXiv:2502.02747},
  year={2025}
}

Made with ❀️ by the UCSB ML Security Team

About

PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages