changes in ragas_metrics module #2218

MukulLambat · 2025-11-15T11:23:31Z

Description

This PR fixes compatibility issues between Giskard's RAG evaluation and ragas>=0.3.x.

In ragas 0.3.x, the RAGAS API introduced two breaking changes:

BaseRagasLLM now defines an abstract method is_finished(), which caused:
TypeError: Can't instantiate abstract class RagasLLMWrapper with abstract method is_finished.
RAGAS metrics (e.g. AnswerRelevancy) no longer expose .score(dict) and instead use
single_turn_score(SingleTurnSample), causing:
AttributeError: 'AnswerRelevancy' object has no attribute 'score'.

This PR:

Implements is_finished() in RagasLLMWrapper to conform to the new BaseRagasLLM interface.
Updates RagasMetric.__call__ to:
- Convert the existing ragas_sample dict into a SingleTurnSample.
- Call metric.single_turn_score(sample) when available.
- Fall back to metric.score(ragas_sample) for older RAGAS versions.
Restores successful execution of RAG-based metrics (Answer Relevancy, Faithfulness, Context Precision, Context Recall, etc.) when using ragas 0.3.x.

Related Issue

This PR fixes #2217

Type of Change

📚 Examples / docs / tutorials / dependencies update
🔧 Bug fix (non-breaking change which fixes an issue)
🥂 Improvement (non-breaking change which improves an existing feature)
🚀 New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to change)
🔐 Security fix

Checklist

I've read the CODE_OF_CONDUCT.md document.
I've read the CONTRIBUTING.md guide.
I've written tests for all new methods and classes that I created.
I've written the docstring in Google format for all the methods and classes that I used.
I've updated the pdm.lock running pdm update-lock (only applicable when pyproject.toml has been
modified)

SurajBhar · 2025-11-19T21:36:25Z

Hey @MukulLambat – the issue is super relevant for anyone using ragas>=0.3.x with Giskard.

I had a look at your changes + the failing CI, and it seems we now also need to align with the async metric API that RAGAS uses (single_turn_ascore / multi_turn_ascore) instead of the older sync .score(...) / .single_turn_score(...).

Below is a concrete set of changes that should make the PR pass CI and be fully compatible with the current RAGAS API.

Update RagasMetric.call in ragas_metrics.py

Use SingleTurnSample + async scoring, and execute it via asyncio:

# add at the top of ragas_metrics.py
import asyncio
from ragas.dataset_schema import SingleTurnSample

Then, inside RagasMetric.call (where you currently build ragas_sample and compute the score), replace the scoring part with:

ragas_sample = self.prepare_ragas_sample(question_sample, answer)

sample = SingleTurnSample(
    user_input=ragas_sample.get("user_input") or ragas_sample.get("question"),
    response=ragas_sample.get("response") or ragas_sample.get("answer"),
    retrieved_contexts=ragas_sample.get("retrieved_contexts") or ragas_sample.get("contexts"),
    reference=ragas_sample.get("reference") or ragas_sample.get("ground_truth"),
)

async def _compute_score():
    if hasattr(self.metric, "single_turn_ascore"):
        return await self.metric.single_turn_ascore(sample)
    elif hasattr(self.metric, "multi_turn_ascore"):
        return await self.metric.multi_turn_ascore(sample)
    else:
        raise AttributeError(
            f"{self.metric} has neither single_turn_ascore nor multi_turn_ascore "
            "— check ragas version or metric type."
        )

loop = asyncio.get_event_loop()
val = loop.run_until_complete(_compute_score())

return {self.name: val}

This matches the new RAGAS guidance where metrics expose async evaluators like single_turn_ascore(...) / multi_turn_ascore(...) for SingleTurnSample.

Fix tests with AsyncMock in tests/rag/test_ragas_metrics.py

Since we’re now calling async methods on the metric, the tests need to mock those with AsyncMock instead of MagicMock.

At the top of tests/rag/test_ragas_metrics.py:

from unittest.mock import MagicMock, Mock, patch, AsyncMock

Then update the two tests to mock single_turn_ascore:

def test_ragas_metric_computation_with_context(ragas_llm_wrapper, ragas_embeddings_wrapper):
    from giskard.rag.metrics.ragas_metrics import RagasMetric

    ragas_metric_mock = AsyncMock()
    ragas_metric_mock.single_turn_ascore.return_value = 0.5

    metric = RagasMetric(
        "test",
        ragas_metric_mock,
        requires_context=True,
        llm_client=Mock(),
        embedding_model=Mock(),
    )

    question_sample = {
        "question": "What is the capital of France?",
        "reference_answer": "Paris",
    }
    answer = AgentAnswer("The capital of France is Paris.", documents=["Paris"])

    result = metric(question_sample, answer)

    assert result == {"test": 0.5}


def test_ragas_metric_computation(ragas_llm_wrapper, ragas_embeddings_wrapper):
    from giskard.rag.metrics.ragas_metrics import RagasMetric

    ragas_metric_mock = AsyncMock()
    ragas_metric_mock.single_turn_ascore.return_value = 0.5

    metric = RagasMetric(
        "test",
        ragas_metric_mock,
        llm_client=Mock(),
        embedding_model=Mock(),
    )

    question_sample = {
        "question": "What is the capital of France?",
        "reference_answer": "Paris",
    }
    answer = AgentAnswer("The capital of France is Paris.")

    result = metric(question_sample, answer)

    assert result == {"test": 0.5}

With these changes:
RagasMetric uses the async scoring API (single_turn_ascore / multi_turn_ascore) that RAGAS now documents.
The tests correctly mock the async methods and should no longer return raw MagicMock(...) objects (which caused the CI failures).
CI should be green again once you push the updated commits.

MukulLambat · 2025-11-20T00:14:21Z

Raised new PR with few additional changes. You can check #2220.

changes in ragas_metrics module

0b66a38

kevinmessiaen self-requested a review November 18, 2025 10:51

Merge branch 'main' into fix/ragas-api-compatibility

d4e8910

MukulLambat mentioned this pull request Nov 19, 2025

changes in ragas_metrics module #2220

Open

5 tasks

MukulLambat closed this Nov 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

changes in ragas_metrics module #2218

changes in ragas_metrics module #2218

Uh oh!

MukulLambat commented Nov 15, 2025 •

edited

Loading

SurajBhar commented Nov 19, 2025

MukulLambat commented Nov 20, 2025

Labels

3 participants

Uh oh!

changes in ragas_metrics module #2218

changes in ragas_metrics module #2218

Uh oh!

Conversation

MukulLambat commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

Checklist

SurajBhar commented Nov 19, 2025

MukulLambat commented Nov 20, 2025

Labels

3 participants

MukulLambat commented Nov 15, 2025 •

edited

Loading