changes in ragas_metrics module #2220
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR updates Giskard’s RAG evaluation to be compatible with the async metrics API introduced in
ragas>=0.3.x, while fixing the originalRagasLLMWrapperinstantiation issue.In
ragas 0.3.x, the API introduced two main breaking changes that affected Giskard:BaseRagasLLMadded a new abstract methodis_finished(), which caused:Metrics such as
AnswerRelevancy,Faithfulnessetc. moved to an async, sample-based API (SingleTurnSample+single_turn_ascore/multi_turn_ascore) instead of the old synchronous.score(dict)/.single_turn_score(...)API.What this PR does
Fix LLM wrapper instantiation
is_finished()inRagasLLMWrapperto conform to the newBaseRagasLLMinterface.Adopt the async metrics API with
SingleTurnSampleIn
RagasMetric.__call__:We continue to build the existing
ragas_sampledict viaprepare_ragas_sample(...).We then construct a
SingleTurnSample:Metric evaluation is now done via the async RAGAS API, using an internal coroutine:
This aligns Giskard with the current RAGAS metric interface, where metrics are evaluated asynchronously over
SingleTurnSample(or multi-turn equivalents).Update tests to use
AsyncMocktests/rag/test_ragas_metrics.pyis updated so that metric mocks reflect the new async API:AsyncMockis used to mocksingle_turn_ascorein:test_ragas_metric_computation_with_contexttest_ragas_metric_computationTests still assert the same observable behaviour:
RagasMetric.__call__returns a dict like{"test": 0.5}.Existing behaviour for the “missing context” case is preserved; the corresponding test continues to check that we return
0.0and log a warning.Behavioural impact
RAG evaluation now works again with
ragas>=0.3.x, using the async metric API andSingleTurnSample.The original error:
is fixed by implementing
is_finished()in the wrapper.Metric computation is driven by
single_turn_ascore/multi_turn_ascore, matching the current RAGAS documentation for legacy metrics.Related Issue
This PR fixes #2218.
Type of Change
Checklist
CODE_OF_CONDUCT.mddocument.CONTRIBUTING.mdguide.pdm.lockby runningpdm update-lock(only applicable ifpyproject.tomlwas modified).