This is the official repository for the paper "Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving" (ICML 2025 CFAgentic Workshop Best Paper Runner-Up Award).
The work is done by Xiangru Tang*, Tianrui Qin*, Tianhao Peng*, Ziyang Zhou, Daniel Shao, Tingting Du, Xinming Wei, Peng Xia, Fang Wu, He Zhu, Ge Zhang, Jiaheng Liu, Xingyao Wang, Sirui Hong, Chenglin Wu, Hao Cheng, Chi Wang, Wangchunshu Zhou.
- Hierarchical Memory Structure: Combines working memory, episodic memory, and semantic knowledge base.
- Agentic Reasoning: Supports autonomous decision-making and planning using LLMs.
- Cross-Domain Adaptability: Designed for generalization across different task domains (e.g., QA, coding, planning).
- Modular Design: Easy to integrate with various benchmarks and environments.
To start, follow the steps below:
cd ./Agent-KB-GAIA/examples/open_deep_researchRun the following command to install the required dependencies from the requirements.txt file:
pip install -r requirements.txtpip install -e ../../.[dev]The agent uses the SearchTool for web search, which requires an environment variable with the corresponding API key, based on the selected provider:
SERP_API_KEYfor SerpApi: Sign up here to get a key
Depending on the model you want to use, you may need to set environment variables. You need to set the OPENAI_BASE_URL and OPENAI_API_KEY environment variable.
Download the GAIA dataset and place it under the following directory:
./data/gaia
The expected directory structure is as follows:
├── data
│ ├── gaia
│ ├── test
│ └── validation
You're now all set to run on GAIA! Simply execute the run_gaia.py script like so:
python run_gaia.py --model-id openai:gpt-4.1 --model-id-search openai:gpt-4.1 --run-name gpt-4.1-gaiaIf you'd like to use different questions or datasets, you can refer to the run_gaia.py script for guidance and make the necessary adjustments.
Now, let's start configuring Agent KB.
Format your knowledge base samples properly and save them in the following file:
./agent_kb/agent_kb_database.json
Each sample in the JSON file should follow this structure:
{
"question": "",
"agent_planning": "",
"search_agent_planning": "",
"agent_experience": "",
"search_agent_experience": ""
}
Launch the Agent KB service by running the script below:
python ./agent_kb/agent_kb_service.pyOnce everything is configured, run the main script using a command similar to the following example:
python run_gaia.py --model-id openai:gpt-4.1 --model-id-search openai:gpt-4.1 --run-name gpt-4.1-gaia --agent_kb --concurrency 1cd ./Agent-KB-SWE-bench/scriptsPlease note that all the files mentioned below are in the "scripts" directory.
bash run_swe_bench_hints_agentless_repo.sh
bash run_swe_bench_agentless.sh
bash run_swebench_eval_hints_agentless_repo.shOther scripts support:
- Hint variants:
repo_2nd,all_bench_hints - Baselines:
plain,plain_qwen,plain_claude
| Script | Description |
|---|---|
build_env.sh |
Build and set up Docker-based environment. |
build.sh |
Local build script (e.g., hint databases). |
run_infer_template.sh |
Base template for LLM prompt runs. |
| Script | Hint Source | Description |
|---|---|---|
run_swe_bench_hints_agentless.sh |
location hints | location hints for baseline. |
run_swe_bench_hints_agentless_repo.sh |
RepoClassBench 1st round | Main Agent KB reasoning setup. |
run_swe_bench_hints_agentless_repo_2nd.sh |
RepoClassBench refined | Deeper/longer hints. |
run_swe_bench_hints_agentless_all_bench_hints.sh |
Combined | Merged hints for high coverage. |
| Script | Model | Description |
|---|---|---|
run_swe_bench_plain.sh |
GPT-4 | Plain run on full benchmark. |
run_swe_bench_plain_claude.sh |
Claude | Claude 3.7 |
run_swe_bench_plain_qwen.sh |
Qwen | Qwen 3 |
| Script | Matches |
|---|---|
run_swebench_eval_plain.sh |
Baseline on full set |
run_swebench_eval_agentless.sh |
Location hints |
run_swebench_eval_hints_agentless_repo.sh |
Location hints, RepoClassBench hint |
run_swebench_eval_agentless_repo_2nd.sh |
Location hints, RepoClassBench refined subset |
run_swebench_eval_agentless_all_bench_hints.sh |
Location hints, all hints subset |
run_swebench_eval_plain_claude.sh |
Claude baseline eval |
run_swebench_eval_plain_qwen.sh |
Qwen baseline eval |
This work builds upon and adapts code from the official implementations of two prominent open-source frameworks:
- smolagents — a lightweight, flexible, and powerful library designed to enable the rapid development and deployment of AI agents with minimal effort.
- OpenHands — an advanced platform enabling AI-powered software development agents capable of interacting with code, terminals, and APIs just like human developers.
These frameworks were instrumental in validating the effectiveness and performance of our proposed methods. We extend our gratitude to the contributors and maintainers of both projects for their foundational work in advancing agent-based systems.
@article{tang2025agent,
title={Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving},
author={Tang, Xiangru and Qin, Tianrui and Peng, Tianhao and Zhou, Ziyang and Shao, Daniel and Du, Tingting and Wei, Xinming and Xia, Peng and Wu, Fang and Zhu, He and others},
journal={arXiv preprint arXiv:2507.06229},
year={2025}
}
