Towards Realistic Project-Level Code Generation via Multi-Agent Collaboration and Semantic Architecture Modeling

ProjectGen is a multi-agent framework that decomposes projects into architecture design, skeleton generation, and code filling stages with iterative refinement and memory-based context management.

ProjectGen

Code Structure Overview

The core implementation resides in the src/ directory, which contains three major components: the multi-agent system, the memory management module, and a set of workflow and utility scripts. The directory structure is shown below:

src/
├── agents/
│   ├── architecture_agent.py
│   ├── arch_judge_agent.py
│   ├── skeleton_agent.py
│   ├── skeleton_judge_agent.py
│   ├── code_agent.py
│   ├── code_judge_agent.py
│   └── test.py
│
├── memory_manager/
│   ├── arch_memory.py
│   ├── skeleton_memory.py
│   ├── code_memory.py
│   └── __init__.py
│
├── build_dependency_graph.py
├── extract_api.py
├── logger.py
├── main.py
├── prompts.py
├── utils.py
└── workflow.py

agents/: This directory contains the core agents used in the three-stage generation workflow. Each stage consists of a generation agent and a judging agent, forming a generate–evaluate–refine loop.
memory_manager/: The memory modules preserve intra-stage semantic information, enabling agents to efficiently access relevant context from previous iterations.
build_dependency_graph.py: Constructs file-level dependency graphs to support ordering.
extract_api.py: Extracts function signatures from generated code to support iteration.
logger.py: A unified logging module for debugging, tracing agent outputs, and monitoring workflow execution.
main.py: The main entry point of the system.
prompts.py: Contains prompt templates used by all agents.
utils.py: Provides general-purpose utility functions.
workflow.py: Defines the overall multi-agent generation workflow.

Installation

conda create -n projectgen python=3.9
conda activate projectgen
pip install -r requirements.txt

Usage

Open src/utils.py and fill in your OpenAI API key:

os.environ["OPENAI_API_KEY"] = "your_openai_api_key_here"

Navigate to the src directory and run the main script:

cd src
python main.py --dataset=CodeProjectEval

The outputs will be stored in the CodeProjectEval_outputs/ folder.

CodeProjectEval

To better reflect real-world project scenarios and support evaluation through executable test cases, we construct a new project-level code generation dataset, CodeProjectEval, which consists of 18 Python repositories covering a wide range of topics.

Detailed Information

Repository	#FILE	#LOC	Complexity	#Check Tests (cov.)	#Unit Tests (cov.)	#PRD tokens
bplustree	8	1,509	2.29	8 (82%)	356 (98%)	1,339
cookiecutter	18	2,805	3.42	7 (55%)	375 (99%)	2,100
csvs-to-sqlite	3	816	5.83	10 (81%)	25 (88%)	1,841
deprecated	3	597	4.08	26 (80%)	176 (95%)	953
djangorestframework-simplejwt	31	2,014	2.09	8 (63%)	191 (93%)	1,614
flask	24	9,314	2.71	25 (52%)	482 (91%)	2,913
imapclient	17	3,531	2.81	9 (40%)	267 (80%)	3,810
parsel	5	1,128	2.60	5 (65%)	250 (95%)	1,522
portalocker	9	1,958	2.84	10 (58%)	71 (94%)	1,990
pyjwt	12	2,690	3.01	10 (53%)	294 (94%)	382
python-hl7	11	2,434	2.98	10 (56%)	100 (87%)	2,292
rsa	14	2,949	2.40	6 (73%)	100 (87%)	3,318
simpy	12	2,184	2.01	7 (60%)	149 (90%)	2,147
tinydb	10	2,170	1.76	10 (58%)	204 (95%)	947
trailscraper	13	890	2.01	4 (65%)	93 (92%)	3,415
voluptuous	7	3,100	2.55	11 (55%)	161 (90%)	1,221
xmnlp	24	1,504	3.47	8 (65%)	23 (81%)	3,105
zxcvbn	8	1,402	5.69	6 (81%)	31 (84%)	2,399
Avg.	12.7	2,388.6	3.03	10 (63.4%)	186 (90.7%)	2,067
Mid.	11.5	2,092	2.76	8.5 (61.5%)	168.5 (91.5%)	2,100

Each repository is supplemented with:

docs/PRD.md: provides detailed descriptions of a software system’s functional and non-functional requirements, guiding subsequent design and development.
docs/UML_pyreverse.md: UML class diagram and package diagram generated by Pyreverse.
docs/architecture_design.md: the directory tree of the repository and descriptions for each source file, accompanied by summaries of the classes and functions they contain.
check_tests/: to provide initial verification during the code generation process.
unit_tests/: executed upon completion of code generation to evaluate the overall quality and functional correctness of the generated projects.

Test Scripts

Similarity-based Evaluation (SketchBLEU)

We adopt SketchBLEU, a similarity-based metric originally introduced in the CodeS framework. This metric evaluates the structural similarity between the generated code and the reference implementation. To calculate SketchBLEU:
```
cd datasets/evaluation
python calc_sketchbleu.py
```
Unit Test–based Evaluation

For functional correctness evaluation, each repository contains a set of unit tests. To run the unit tests:
```
cd datasets/evaluation
python calc_passrate.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
datasets		datasets
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Towards Realistic Project-Level Code Generation via Multi-Agent Collaboration and Semantic Architecture Modeling

ProjectGen

Code Structure Overview

Installation

Usage

CodeProjectEval

Detailed Information

Test Scripts

About

Uh oh!

Releases

Packages

Languages

License

whisperzqh/ProjectGen

Folders and files

Latest commit

History

Repository files navigation

Towards Realistic Project-Level Code Generation via Multi-Agent Collaboration and Semantic Architecture Modeling

ProjectGen

Code Structure Overview

Installation

Usage

CodeProjectEval

Detailed Information

Test Scripts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages