Skip to content

Commit cd3835a

Browse files
fix(py):restore flow samples after accidental deletion and modify README (#4024)
Co-authored-by: Mengqin Shen <mengqin@google.com>
1 parent b71dcc7 commit cd3835a

File tree

2 files changed

+11
-48
lines changed

2 files changed

+11
-48
lines changed

‎py/samples/evaluator-demo/README.md‎

Lines changed: 8 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -12,35 +12,15 @@ Note: This sample focuses on evaluation features in Genkit, by utilizing the off
1212
genkit start -- uv run src/main.py
1313
# This command should output the link to the Genkit Dev UI.
1414
```
15+
Choose any flow of interest and run in the Dev UI.
16+
## Available Flows
1517

16-
The rest of the commands in this guide can be run in a separate terminal or directly in the Dev UI.
17-
18-
### Initial Setup
19-
20-
```bash
21-
# Index "docs/cat-handbook.pdf" to start
22-
# testing Genkit evaluation features. Please see
23-
# src/setup.py for more details.
24-
25-
genkit flow:run setup
26-
```
27-
28-
## Evaluations
29-
30-
### Running Evaluations via CLI
31-
32-
Use the `eval:flow` command to run a flow against a dataset and evaluate the outputs:
33-
34-
```bash
35-
# Evaluate with a specific evaluator
36-
genkit eval:flow pdf_qa --input data/cat_adoption_questions.json --evaluator=custom/test_evaluator
37-
38-
# Evaluate with multiple evaluators
39-
genkit eval:flow pdf_qa --input data/cat_adoption_questions.json --evaluator=genkitEval/faithfulness --evaluator=genkitEval/maliciousness
40-
41-
# Evaluate with all available evaluators (omit --evaluator flag)
42-
genkit eval:flow pdf_qa --input data/cat_adoption_questions.json
43-
```
18+
- **setup**: Indexes the default PDF document (`docs/cat-handbook.pdf`) into the vector store
19+
- **pdf_qa**: RAG flow that answers questions based on indexed PDF documents. It requires `setup` flow run first.
20+
- **index_pdf**: Indexes a specified PDF file (defaults to `docs/cat-wiki.pdf`)
21+
- **simple_structured**: Simple flow with structured input/output
22+
- **simple_echo**: Simple echo flow
23+
- **dog_facts_eval**: Programmatic evaluation flow using the faithfulness metric on a dog facts dataset. **Note:** This flow can take several minutes to complete.
4424

4525
### Running Evaluations in Dev UI
4626

@@ -57,26 +37,6 @@ genkit eval:flow pdf_qa --input data/cat_adoption_questions.json
5737
4. Click **"Run"**
5838
5. View results in the Evaluations tab
5939

60-
### Programmatic Evaluation
61-
62-
The `dog_facts_eval` flow demonstrates running evaluations from code. See `src/eval_in_code.py` for implementation details.
63-
64-
```bash
65-
# Run programmatic evaluation
66-
genkit flow:run dog_facts_eval
67-
```
68-
69-
**Note:** The `dog_facts_eval` flow evaluates 20 test cases with the faithfulness metric, making 40 LLM API calls. This takes approximately 5 minutes to complete.
70-
71-
## Available Flows
72-
73-
- **setup**: Indexes the default PDF document (`docs/cat-handbook.pdf`) into the vector store
74-
- **index_pdf**: Indexes a specified PDF file (defaults to `docs/cat-wiki.pdf`)
75-
- **pdf_qa**: RAG flow that answers questions based on indexed PDF documents. It requires `setup` flow run first.
76-
- **simple_structured**: Simple flow with structured input/output
77-
- **simple_echo**: Simple echo flow
78-
- **dog_facts_eval**: Programmatic evaluation flow using the faithfulness metric on a dog facts dataset
79-
8040
## Reference
8141

8242
For more details on using Genkit evaluations, please refer to the official [Genkit documentation](https://firebase.google.com/docs/genkit/evaluation).

‎py/samples/evaluator-demo/src/main.py‎

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,10 @@
1616

1717
import random
1818

19+
from eval_in_code import dog_facts_eval_flow
1920
from genkit_demo import ai
21+
from pdf_rag import index_pdf, pdf_qa, simple_echo, simple_structured
22+
from setup import setup
2023

2124
from genkit.core.typing import BaseEvalDataPoint, EvalStatusEnum, Score
2225

0 commit comments

Comments
 (0)