You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An example demonstrating running flows using the Google GenAI plugin.
3
+
This sample demonstrates the different evaluation features using Genkit Python SDK.
4
4
5
-
## Setup environment
5
+
Note: This sample focuses on evaluation features in Genkit, by utilizing the official Genkit Evaluators plugin. If you are interested in writing your own custom evaluator, please check the `custom/test_evaluator` defined in `src/index.py`.
6
6
7
-
Obtain an API key from [ai.dev](https://ai.dev).
8
-
9
-
Export the API key as env variable `GEMINI\_API\_KEY` in your shell
10
-
configuration.
7
+
## Setup and start the sample
11
8
12
9
```bash
13
-
export GEMINI_API_KEY='<Your api key>'
10
+
11
+
# Start the Genkit Dev UI
12
+
genkit start -- uv run samples/evaluator-demo/src/index.py
13
+
# This command should output the link to the Genkit Dev UI.
14
14
```
15
15
16
-
## Run the sample
16
+
The rest of the commands in this guide can be run in a separate terminal or directly in the Dev UI.
17
17
18
-
Start the Genkit Developer UI:
18
+
### Initial Setup
19
19
20
20
```bash
21
-
genkit start -- uv run src/eval_demo.py
21
+
# Index "docs/cat-handbook.pdf" to start
22
+
# testing Genkit evaluation features. Please see
23
+
# src/setup.py for more details.
24
+
25
+
genkit flow:run setup
22
26
```
23
27
24
28
## Evaluations
25
29
26
-
### Simple inference and evaluation
30
+
### Running Evaluations via CLI
31
+
32
+
Use the `eval:flow` command to run a flow against a dataset and evaluate the outputs:
-`genkitEval/answer_relevancy` - Checks if answer is relevant to question
57
+
4. Click **"Run"**
58
+
5. View results in the Evaluations tab
31
59
32
-
##Run tests
60
+
### Programmatic Evaluation
33
61
34
-
To run the automated tests for this sample:
62
+
The `dog_facts_eval` flow demonstrates running evaluations from code. See `src/eval_in_code.py`for implementation details.
35
63
36
64
```bash
37
-
uv run pytest -v src/eval_demo.py
65
+
# Run programmatic evaluation
66
+
genkit flow:run dog_facts_eval
38
67
```
68
+
69
+
**Note:** The `dog_facts_eval` flow evaluates 20 test cases with the faithfulness metric, making 40 LLM API calls. This takes approximately 5 minutes to complete.
70
+
71
+
## Available Flows
72
+
73
+
-**setup**: Indexes the default PDF document (`docs/cat-handbook.pdf`) into the vector store
74
+
-**index_pdf**: Indexes a specified PDF file (defaults to `docs/cat-wiki.pdf`)
75
+
-**pdf_qa**: RAG flow that answers questions based on indexed PDF documents. It requires `setup` flow run first.
76
+
-**simple_structured**: Simple flow with structured input/output
77
+
-**simple_echo**: Simple echo flow
78
+
-**dog_facts_eval**: Programmatic evaluation flow using the faithfulness metric on a dog facts dataset
79
+
80
+
## Reference
81
+
82
+
For more details on using Genkit evaluations, please refer to the official [Genkit documentation](https://firebase.google.com/docs/genkit/evaluation).
0 commit comments