Skip to content

Commit 06c8aeb

Browse files
Update the Judge LLM settings in the examples to avoid retries (#204)
Closes #202 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/advanced/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Eric Evans II (https://github.com/ericevans-nv) URL: #204
1 parent 7a72e01 commit 06c8aeb

File tree

11 files changed

+23
-35
lines changed

11 files changed

+23
-35
lines changed

‎docs/source/guides/evaluate.md

+12-4
Original file line numberDiff line numberDiff line change
@@ -112,11 +112,19 @@ These metrics use a judge LLM for evaluating the generated output and retrieved
112112
llms:
113113
nim_rag_eval_llm:
114114
_type: nim
115-
model_name: meta/llama-3.3-70b-instruct
116-
temperature: 0.0000001
117-
top_p: 0.0001
118-
max_tokens: 2
115+
model_name: meta/llama-3.1-70b-instruct
116+
max_tokens: 8
119117
```
118+
For these metrics, it is recommended to use 8 tokens for the judge LLM.
119+
120+
Evaluation is dependent on the judge LLM's ability to accurately evaluate the generated output and retrieved context. This is the leadership board for the judge LLM:
121+
```
122+
1)- mistralai/mixtral-8x22b-instruct-v0.1
123+
2)- mistralai/mixtral-8x7b-instruct-v0.1
124+
3)- meta/llama-3.1-70b-instruct
125+
4)- meta/llama-3.3-70b-instruct
126+
```
127+
For a complete list of up-to-date judge LLMs, refer to the [RAGAS NV metrics leadership board](https://github.com/explodinggradients/ragas/blob/main/src/ragas/metrics/_nv_metrics.py)
120128

121129
### Trajectory Evaluator
122130
This evaluator uses the intermediate steps generated by the workflow to evaluate the workflow trajectory. The evaluator configuration includes the evaluator type and any additional parameters required by the evaluator.

‎examples/documentation_guides/workflows/text_file_ingest/src/text_file_ingest/configs/config.yml

+1-3
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,7 @@ llms:
3535
nim_rag_eval_llm:
3636
_type: nim
3737
model_name: meta/llama-3.1-70b-instruct
38-
temperature: 0.0000001
39-
top_p: 0.0001
40-
max_tokens: 2
38+
max_tokens: 8
4139
nim_rag_eval_large_llm:
4240
_type: nim
4341
model_name: meta/llama-3.1-70b-instruct

‎examples/email_phishing_analyzer/configs/config-llama-3.1-8b-instruct.yml

+1-3
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,7 @@ llms:
5151
nim_rag_eval_llm:
5252
_type: nim
5353
model_name: meta/llama-3.1-70b-instruct
54-
temperature: 0.0000001
55-
top_p: 0.0001
56-
max_tokens: 2
54+
max_tokens: 8
5755
nim_trajectory_eval_llm:
5856
_type: nim
5957
model_name: meta/llama-3.1-70b-instruct

‎examples/email_phishing_analyzer/configs/config-llama-3.3-70b-instruct.yml

+1-3
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,7 @@ llms:
5151
nim_rag_eval_llm:
5252
_type: nim
5353
model_name: meta/llama-3.1-70b-instruct
54-
temperature: 0.0000001
55-
top_p: 0.0001
56-
max_tokens: 2
54+
max_tokens: 8
5755
nim_trajectory_eval_llm:
5856
_type: nim
5957
model_name: meta/llama-3.1-70b-instruct

‎examples/email_phishing_analyzer/configs/config-mixtral-8x22b-instruct-v0.1.yml

+1-3
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,7 @@ llms:
5151
nim_rag_eval_llm:
5252
_type: nim
5353
model_name: meta/llama-3.1-70b-instruct
54-
temperature: 0.0000001
55-
top_p: 0.0001
56-
max_tokens: 2
54+
max_tokens: 8
5755
nim_trajectory_eval_llm:
5856
_type: nim
5957
model_name: meta/llama-3.1-70b-instruct

‎examples/email_phishing_analyzer/configs/config-phi-3-medium-4k-instruct.yml

+1-3
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,7 @@ llms:
5050
nim_rag_eval_llm:
5151
_type: nim
5252
model_name: meta/llama-3.1-70b-instruct
53-
temperature: 0.0000001
54-
top_p: 0.0001
55-
max_tokens: 2
53+
max_tokens: 8
5654
nim_trajectory_eval_llm:
5755
_type: nim
5856
model_name: meta/llama-3.1-70b-instruct

‎examples/email_phishing_analyzer/configs/config-phi-3-mini-4k-instruct.yml

+1-3
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,7 @@ llms:
5151
nim_rag_eval_llm:
5252
_type: nim
5353
model_name: meta/llama-3.1-70b-instruct
54-
temperature: 0.0000001
55-
top_p: 0.0001
56-
max_tokens: 2
54+
max_tokens: 8
5755
nim_trajectory_eval_llm:
5856
_type: nim
5957
model_name: meta/llama-3.1-70b-instruct

‎examples/email_phishing_analyzer/configs/config-reasoning.yml

+1-3
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,7 @@ llms:
6060
nim_rag_eval_llm:
6161
_type: nim
6262
model_name: meta/llama-3.1-70b-instruct
63-
temperature: 0.0000001
64-
top_p: 0.0001
65-
max_tokens: 2
63+
max_tokens: 8
6664
r1_model:
6765
_type: nim
6866
model_name: deepseek-ai/deepseek-r1

‎examples/email_phishing_analyzer/configs/config.yml

+1-3
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,7 @@ llms:
5151
nim_rag_eval_llm:
5252
_type: nim
5353
model_name: meta/llama-3.1-70b-instruct
54-
temperature: 0.0000001
55-
top_p: 0.0001
56-
max_tokens: 2
54+
max_tokens: 8
5755
nim_trajectory_eval_llm:
5856
_type: nim
5957
model_name: meta/llama-3.1-70b-instruct

‎examples/simple/src/aiq_simple/configs/eval_config.yml

+1-3
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,7 @@ llms:
3434
nim_rag_eval_llm:
3535
_type: nim
3636
model_name: meta/llama-3.1-70b-instruct
37-
temperature: 0.0000001
38-
top_p: 0.0001
39-
max_tokens: 6
37+
max_tokens: 8
4038
nim_trajectory_eval_llm:
4139
_type: nim
4240
model_name: meta/llama-3.1-70b-instruct

‎examples/simple/src/aiq_simple/configs/eval_upload_config.yml

+2-4
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,8 @@ llms:
3838
temperature: 0.0
3939
nim_rag_eval_llm:
4040
_type: nim
41-
model_name: meta/llama-3.3-70b-instruct
42-
temperature: 0.0000001
43-
top_p: 0.0001
44-
max_tokens: 2
41+
model_name: meta/llama-3.1-70b-instruct
42+
max_tokens: 8
4543
nim_trajectory_eval_llm:
4644
_type: nim
4745
model_name: meta/llama-3.1-70b-instruct

0 commit comments

Comments
 (0)