ProjectEval is a multi-level benchmark for evaluating LLMs on complex project-level code generation tasks. It simulates realistic software engineering workflows by combining natural language prompts, structured checklists, and code skeletons.
📄 Paper | 🚀 Project | ✉️ Contact Us | 📤 Submit Your Model's Result
| Model | Report By | Report Date | Output Format | Cascade | Direct | All Avg. | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Level 1 | Level 2 | Avg. | Level 1 | Level 2 | Level 3 | Avg. | |||||
| Model | Report By | Report Date | Output Format | Cascade | Direct | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Level 1 | Level 2 | Level 1 | Level 2 | Level 3 | ||||||||||||
| CL | SK | Code | PV | SK | Code | PV | Code | PV | Code | PV | Code | PV | ||||