changes in ragas_metrics module #2220

MukulLambat · 2025-11-19T23:53:02Z

Description

This PR updates Giskard’s RAG evaluation to be compatible with the async metrics API introduced in ragas>=0.3.x, while fixing the original RagasLLMWrapper instantiation issue.

In ragas 0.3.x, the API introduced two main breaking changes that affected Giskard:

BaseRagasLLM added a new abstract method is_finished(), which caused:

TypeError: Can't instantiate abstract class RagasLLMWrapper with abstract method is_finished

Metrics such as AnswerRelevancy, Faithfulness etc. moved to an async, sample-based API (SingleTurnSample + single_turn_ascore / multi_turn_ascore) instead of the old synchronous .score(dict) / .single_turn_score(...) API.

What this PR does

Fix LLM wrapper instantiation
- Implements is_finished() in RagasLLMWrapper to conform to the new BaseRagasLLM interface.
- The implementation inspects generation metadata / finish reasons so that RAGAS can correctly decide when a completion is done.

Adopt the async metrics API with SingleTurnSample

In RagasMetric.__call__:

We continue to build the existing ragas_sample dict via prepare_ragas_sample(...).

We then construct a SingleTurnSample:

sample = SingleTurnSample(
    user_input=ragas_sample.get("user_input") or ragas_sample.get("question"),
    response=ragas_sample.get("response") or ragas_sample.get("answer"),
    retrieved_contexts=ragas_sample.get("retrieved_contexts") or ragas_sample.get("contexts"),
    reference=ragas_sample.get("reference") or ragas_sample.get("ground_truth"),
)

Metric evaluation is now done via the async RAGAS API, using an internal coroutine:

async def _compute_score():
    if hasattr(self.metric, "single_turn_ascore"):
        return await self.metric.single_turn_ascore(sample)
    elif hasattr(self.metric, "multi_turn_ascore"):
        return await self.metric.multi_turn_ascore(sample)
    else:
        raise AttributeError(
            f"{self.metric} has neither single_turn_ascore nor multi_turn_ascore "
            "— check ragas version or metric type."
        )

loop = asyncio.get_event_loop()
val = loop.run_until_complete(_compute_score())
return {self.name: val}

This aligns Giskard with the current RAGAS metric interface, where metrics are evaluated asynchronously over SingleTurnSample (or multi-turn equivalents).

Update tests to use AsyncMock
- tests/rag/test_ragas_metrics.py is updated so that metric mocks reflect the new async API:
  - AsyncMock is used to mock single_turn_ascore in:
    - test_ragas_metric_computation_with_context
    - test_ragas_metric_computation
  - Tests still assert the same observable behaviour: RagasMetric.__call__ returns a dict like {"test": 0.5}.
- Existing behaviour for the “missing context” case is preserved; the corresponding test continues to check that we return 0.0 and log a warning.

Behavioural impact

RAG evaluation now works again with ragas>=0.3.x, using the async metric API and SingleTurnSample.

The original error:

Can't instantiate abstract class RagasLLMWrapper with abstract method is_finished

is fixed by implementing is_finished() in the wrapper.

Metric computation is driven by single_turn_ascore / multi_turn_ascore, matching the current RAGAS documentation for legacy metrics.

Related Issue

This PR fixes #2218.

Type of Change

🔧 Bug fix (non-breaking change which fixes an issue)
🥂 Improvement (non-breaking change which improves an existing feature)

Checklist

I’ve read the CODE_OF_CONDUCT.md document.
I’ve read the CONTRIBUTING.md guide.
I’ve written tests (or updated existing ones) for the methods and classes I touched.
I’ve updated docstrings where necessary.
I’ve updated pdm.lock by running pdm update-lock (only applicable if pyproject.toml was modified).

sonarqube-scan-action@v6

SurajBhar · 2025-11-20T00:07:59Z

@kevinmessiaen Please chek the new PR with updated changes! We have reversed the changes made in build-python.yml, the file is in its original state. Please cross check it and let us know your feedback on this PR!

MukulLambat and others added 6 commits November 13, 2025 13:04

changes in ragas_metrics module

0b66a38

Used SingleTurnSample + async scoring

32637d2

Fixed tests with AsyncMock: tests/rag/test_ragas_metrics

daef20e

Solved Flake8 Linting Problem

832854e

Updated build-python.yml

7199cfa

sonarqube-scan-action@v6

Update build-python.yml

d3accdf

MukulLambat mentioned this pull request Nov 20, 2025

changes in ragas_metrics module #2218

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

changes in ragas_metrics module #2220

changes in ragas_metrics module #2220

Uh oh!

MukulLambat commented Nov 19, 2025

SurajBhar commented Nov 20, 2025

Labels

2 participants

Uh oh!

changes in ragas_metrics module #2220

Are you sure you want to change the base?

changes in ragas_metrics module #2220

Uh oh!

Conversation

MukulLambat commented Nov 19, 2025

What this PR does

Behavioural impact

Related Issue

Type of Change

Checklist

SurajBhar commented Nov 20, 2025

Labels

2 participants