Overview
Support CRUD operations and management of full datasets (input, output, context, etc.) in the Dev UI. This will add 1st class support for all evaluation use cases (eg: prod traces) in the Dev UI.
This is a blocker for agent evals and supporting interrupts in evals.
User goal(s)
Create and manage full datasets in the Dev UI
Requirements
Acceptance Criteria
- 1 Create full datasets
- 2 Edit, Update, Delete examples, delete dataset
- 3 Run evaluation (without inference) from the dataset