Skip to content

support vllm acceleration during evaluation #24

@Rainier-rq

Description

@Rainier-rq

Thanks for your questions! You can flexibly adjust the inference setup based on your computational resources without affecting results.

If you only want to test a subset, you can directly filter the query file, but for a small-scale experiment we don’t recommend taking the first 100, please sample randomly instead.

In the repo, we provide two evaluation approaches: using a trained critic model or external SOTA models as judges. To stay consistent with our latest leaderboard, we recommend using the same judge setup as the current leaderboard (currently Claude-4.5).

Originally posted by @A-Quarter-Mile in #19

Is it possible to modify the code to support using vLLM for acceleration during evaluation (by using the critic model trained in the paper)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions