Add GPT-5 to Code Eval #180

sitammeur · 2025-08-11T12:43:24Z

Summary by CodeRabbit

New Features
- Added support for GPT-5 and Claude Opus 4.1 in the model list.
- Updated fallback behavior: when a selected model isn’t available, defaults to Claude Opus 4.1 for Model 1.
Refactor
- Enhanced routing so GPT models use a dedicated path, improving reliability and consistency of responses.
Style
- Rebranded UI: page title and main header updated to “⚔️ CodeArena: Compare Codegen Models.”

coderabbitai · 2025-08-11T12:43:31Z

Walkthrough

The UI title/header was updated. The default fallback for model1 changed to “Claude Opus 4.1.” The model catalog replaced “Claude Sonnet 4” and “GPT-4.1” with “Claude Opus 4.1” and “GPT-5.” Response routing now sends “GPT” models to OpenAI (OPENAI_API_KEY, max_completion_tokens) and others to OpenRouter (OPENROUTER_API_KEY, max_tokens).

Changes

Cohort / File(s)	Summary
UI branding & default fallback `code-model-comparison/app.py`	Updated page title/header text. Adjusted default fallback for model1 to “Claude Opus 4.1” when unavailable; model2 fallback unchanged.
Model catalog & routing logic `code-model-comparison/model_service.py`	Updated AVAILABLE_MODELS (removed “Claude Sonnet 4”, “GPT-4.1”; added “Claude Opus 4.1”, “GPT-5”). get_model_response_async now routes models containing “GPT” to OpenAI with OPENAI_API_KEY and max_completion_tokens=2000; others to OpenRouter with OPENROUTER_API_KEY and max_tokens=2000.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant App as App UI
  participant MS as ModelService
  participant OA as OpenAI API
  participant OR as OpenRouter API

  User->>App: Select models / submit prompt
  App->>MS: get_model_response_async(model_name, prompt)
  alt model_name contains "GPT"
    MS->>OA: acompletion(api_key=OPENAI_API_KEY, max_completion_tokens=2000)
    OA-->>MS: response stream
  else
    MS->>OR: acompletion(api_key=OPENROUTER_API_KEY, max_tokens=2000)
    OR-->>MS: response stream
  end
  MS-->>App: stream tokens / final text
  App-->>User: display comparison output

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Add Code Eval with Opik #166: Modifies the same app default-model fallback and model_service AVAILABLE_MODELS plus routing/API-key logic, indicating a closely related update.

Poem

In CodeArena’s glow, I tap my paws,
New banners raised, with title applause.
GPT hops to OpenAI’s gate,
Others to OpenRouter’s fate.
Opus ascends where Sonnet stood—
A bunny reviews, “All understood!”
Now, compare away—hop-hop, it’s good! 🐇⚔️

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (2)

code-model-comparison/model_service.py (1)

75-84: Reflect correct API key environment variable in error messages
Adjust the exception handler in code-model-comparison/model_service.py (around lines 75–84) so that invalid/missing API key errors reference OPENAI_API_KEY when routing to OpenAI and OPENROUTER_API_KEY otherwise.

• File: code-model-comparison/model_service.py
Lines: 75–84

     except Exception as e:
         error_msg = f"Error generating response: {str(e)}"
-        if "api_key" in str(e).lower() or "authentication" in str(e).lower():
-            error_msg = "Error: Invalid or missing API key. Please check your OPENROUTER_API_KEY configuration."
+        if "api_key" in str(e).lower() or "authentication" in str(e).lower():
+            # Dynamically reference the correct env var based on model routing
+            env_var = "OPENAI_API_KEY" if "GPT" in model_name else "OPENROUTER_API_KEY"
+            error_msg = (
+                f"Error: Invalid or missing API key. "
+                f"Please check your {env_var} configuration."
+            )
         elif "quota" in str(e).lower() or "limit" in str(e).lower():
             error_msg = "Error: API quota exceeded or rate limit reached. Please try again later."
         elif "model" in str(e).lower():
             error_msg = f"Error: Model '{model_name}' is not available or has issues. Please try a different model."

code-model-comparison/app.py (1)

65-71: Broken logo URL (double protocol) in “Powered by” strip.

The second image src has a duplicated https://, so the Opik logo won’t render.

-        <img src="https://files.buildwithfern.com/https://opik.docs.buildwithfern.com/docs/opik/2025-08-01T07:08:31.326Z/img/logo-dark-mode.svg" width="100">
+        <img src="https://files.buildwithfern.com/opik.docs.buildwithfern.com/docs/opik/2025-08-01T07:08:31.326Z/img/logo-dark-mode.svg" width="100">

🧹 Nitpick comments (1)

code-model-comparison/app.py (1)
91-94: Unreachable fallback branch; simplify default selection.

Because you stop execution when all_models is empty (Lines 82–85), the else "Claude Opus 4.1" fallback on Line 92 is unreachable. Simplify to avoid config drift.
-if default_model1 not in all_models:
-    default_model1 = all_models[0] if all_models else "Claude Opus 4.1"
+if default_model1 not in all_models:
+    default_model1 = all_models[0]

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11a169b and 166a064.

📒 Files selected for processing (2)

code-model-comparison/app.py (3 hunks)
code-model-comparison/model_service.py (2 hunks)

🔇 Additional comments (2)

code-model-comparison/app.py (2)

13-13: Branding/title change LGTM.

62-62: Main header change LGTM.

coderabbitai · 2025-08-11T13:31:21Z

code-model-comparison/model_service.py

+    "Claude Opus 4.1": "openrouter/anthropic/claude-opus-4.1",
    "Qwen3-Coder": "openrouter/qwen/qwen3-coder",
    "Gemini 2.5 Flash": "openrouter/google/gemini-2.5-flash",
-    "GPT-4.1": "openrouter/openai/gpt-4.1",
+    "GPT-5": "gpt-5",
 }


💡 Verification agent

🧩 Analysis chain

Model catalog updates look consistent, but verify identifiers exist upstream.

"Claude Opus 4.1" and "GPT-5" entries look fine. Ensure LiteLLM recognizes these identifiers and providers as configured.

Run web checks to confirm model IDs and parameter support:

If needed, I can adjust the mapping to provider-qualified IDs (e.g., openai/gpt-5) or add a fallback route.

🌐 Web query:

Does LiteLLM support the model id "gpt-5" with acompletion, and should we pass max_tokens or max_completion_tokens for streaming?

💡 Result:

Short answer: Yes — LiteLLM supports GPT‑5 model IDs (e.g. "openai/gpt-5" or "azure/gpt-5") for its completion APIs, and for GPT‑5 you should use max_completion_tokens to control the streamed completion length (LiteLLM will also accept max_tokens and map it to max_completion_tokens for GPT‑5, but passing max_completion_tokens is clearer/explicit). (docs.litellm.ai)

Quick notes and example:

Supported IDs: LiteLLM lists OpenAI/Azure GPT‑5 model IDs such as openai/gpt-5 and openai/gpt-5-chat (and provider-prefixed variants). (docs.litellm.ai)

Params: LiteLLM exposes both max_tokens and max_completion_tokens; for GPT‑5 it maps max_tokens → max_completion_tokens, and you can (and probably should) pass max_completion_tokens when streaming so you explicitly set the completion token cap. (docs.litellm.ai)

Example (Python):
from litellm import completion
resp = completion(
model="openai/gpt-5",
messages=[{"role":"user","content":"Write a short summary of X."}],
stream=True,
max_completion_tokens=250
)
for chunk in resp:
print(chunk.choices[0].delta)

If you want, tell me which provider/model string you’re using (openai/…, azure/…, or bare "gpt-5") and I’ll give a tailored snippet and confirm the exact param mapping for your LiteLLM version.

Citations:

1: https://docs.litellm.ai/release_notes/v1-75-5?utm_source=chatgpt.com

2: https://docs.litellm.ai/release_notes/v1-75-5?utm_source=chatgpt.com

3: https://docs.litellm.ai/docs/completion/input?utm_source=chatgpt.com

🌐 Web query:

Is "openrouter/anthropic/claude-opus-4.1" a valid model identifier on OpenRouter via LiteLLM?

💡 Result:

Short answer: No. The valid Anthropic Opus identifier on OpenRouter via LiteLLM uses the Claude 3 naming, e.g., openrouter/anthropic/claude-3-opus-20240229, not “claude-opus-4.1.”[1][3]

Details:

LiteLLM’s OpenRouter convention is openrouter/provider/model-name, and Anthropic’s Opus listing on OpenRouter uses the Claude 3 format such as claude-3-opus-20240229.[1]

LiteLLM docs state you can call any OpenRouter model by passing model="openrouter/", with examples including openrouter/anthropic/claude-2 and the Claude 3 variants, but there is no “claude-opus-4.1” pattern shown.[3]

A third-party page shows a string like openrouter/anthropic/claude-opus-4.1, but this appears inconsistent with OpenRouter’s established Claude 3 model identifiers and should not be relied upon without confirmation on the OpenRouter models list.[2]

Recommendation:

Use the confirmed identifier: openrouter/anthropic/claude-3-opus-20240229.[1]

If you need the latest Sonnet/Haiku/Opus variant, check the OpenRouter model list and plug it into LiteLLM as openrouter/anthropic/.[3]

Citations:

1: https://saptak.in/writing/2025/05/10/google-adk-masterclass-part3

2: https://lightning.ai/lightning-purchase-test/studios/compare-openai-gpt-5-and-claude-opus-4-1-for-code-generation

3: https://docs.litellm.ai/docs/providers/openrouter

4: https://docs.litellm.ai/docs/completion/input

5: https://www.bitdoze.com/google-adk-openrouter-models/

6: https://python.useinstructor.com/integrations/openrouter/

7: https://github.com/BerriAI/litellm/blob/main/README.md

8: https://denshub.com/en/choosing-llm-gateway/

9: https://www.truefoundry.com/blog/litellm-vs-openrouter

10: https://ai.plainenglish.io/power-up-your-ai-agents-unlocking-a-universe-of-models-with-litellm-and-openrouter-7c25262cbbe9

Fix invalid LiteLLM model identifiers and use explicit max_completion_tokens

Please update the model mapping in code-model-comparison/model_service.py and streaming calls as follows:

In the AVAILABLE_MODELS mapping (lines 9–13):

- "Claude Opus 4.1": "openrouter/anthropic/claude-opus-4.1", + "Claude Opus 4.1": "openrouter/anthropic/claude-3-opus-20240229", # valid OpenRouter Anthropic Opus ID - "GPT-5": "gpt-5", + "GPT-5": "openai/gpt-5", # provider-prefixed for LiteLLM compatibility

In your GPT-5 streaming completion branches (around lines 49–64):
• Replace any usage of max_tokens with max_completion_tokens to explicitly set the streamed token cap (LiteLLM will map max_tokens → max_completion_tokens but explicit is clearer).

resp = completion( model="openai/gpt-5", messages=…, stream=True, - max_tokens=250 + max_completion_tokens=250 )

These changes ensure LiteLLM recognizes both model IDs and correctly applies the streaming token limit.

🤖 Prompt for AI Agents

In code-model-comparison/model_service.py lines 9–13 and the GPT-5 streaming branches around lines 49–64, update the AVAILABLE_MODELS mapping so the GPT-5 entry uses the valid LiteLLM identifier ("openai/gpt-5") instead of "gpt-5", and in all GPT-5 stream calls replace max_tokens with max_completion_tokens (e.g., max_completion_tokens=250) so the streamed token cap is set explicitly; keep the rest of the call arguments the same.

coderabbitai · 2025-08-11T13:31:21Z

code-model-comparison/model_service.py

+        if "GPT" in model_name:
+            response = await acompletion(
+                model=model_mapping,
+                messages=messages,
+                api_key=os.getenv("OPENAI_API_KEY"),
+                max_completion_tokens=2000,
+                stream=True,
+            )
+        else:
+            response = await acompletion(
+                model=model_mapping,
+                messages=messages,
+                api_key=os.getenv("OPENROUTER_API_KEY"),
+                max_tokens=2000,
+                stream=True,
+            )


⚠️ Potential issue

Use max_tokens for LiteLLM acompletion; align params across routes.

The GPT branch passes max_completion_tokens, which acompletion may not accept. Use max_tokens in both branches to avoid parameter mismatch and parity issues.

Apply this minimal diff:

- if "GPT" in model_name: - response = await acompletion( - model=model_mapping, - messages=messages, - api_key=os.getenv("OPENAI_API_KEY"), - max_completion_tokens=2000, - stream=True, - ) + if "GPT" in model_name: + response = await acompletion( + model=model_mapping, + messages=messages, + api_key=os.getenv("OPENAI_API_KEY"), + max_tokens=2000, + stream=True, + )

Optionally, dedupe branches by deriving api_key and calling acompletion once:

- if "GPT" in model_name: - response = await acompletion( - model=model_mapping, - messages=messages, - api_key=os.getenv("OPENAI_API_KEY"), - max_tokens=2000, - stream=True, - ) - else: - response = await acompletion( - model=model_mapping, - messages=messages, - api_key=os.getenv("OPENROUTER_API_KEY"), - max_tokens=2000, - stream=True, - ) + is_gpt = "GPT" in model_name + response = await acompletion( + model=model_mapping, + messages=messages, + api_key=os.getenv("OPENAI_API_KEY") if is_gpt else os.getenv("OPENROUTER_API_KEY"), + max_tokens=2000, + stream=True, + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if "GPT" in model_name:

response = await acompletion(

model=model_mapping,

messages=messages,

api_key=os.getenv("OPENAI_API_KEY"),

max_completion_tokens=2000,

stream=True,

)

else:

response = await acompletion(

model=model_mapping,

messages=messages,

api_key=os.getenv("OPENROUTER_API_KEY"),

max_tokens=2000,

stream=True,

)

if "GPT" in model_name:

response = await acompletion(

model=model_mapping,

messages=messages,

api_key=os.getenv("OPENAI_API_KEY"),

max_tokens=2000,

stream=True,

)

else:

response = await acompletion(

model=model_mapping,

messages=messages,

api_key=os.getenv("OPENROUTER_API_KEY"),

max_tokens=2000,

stream=True,

)

🤖 Prompt for AI Agents

In code-model-comparison/model_service.py around lines 49 to 64, the GPT branch passes max_completion_tokens while the other branch uses max_tokens, causing acompletion parameter mismatch; change the GPT branch to use max_tokens=2000 (matching the other branch) and align other parameter names, and optionally simplify by setting api_key = os.getenv("OPENAI_API_KEY") if "GPT" in model_name else os.getenv("OPENROUTER_API_KEY") and call acompletion once with the derived api_key and shared args (model=model_mapping, messages=messages, api_key=api_key, max_tokens=2000, stream=True).

Add GPT-5 model

dc0da72

update streamlit ui

166a064

coderabbitai bot reviewed Aug 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GPT-5 to Code Eval #180

Add GPT-5 to Code Eval #180

Uh oh!

sitammeur commented Aug 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 11, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Aug 11, 2025

coderabbitai bot Aug 11, 2025

Labels

1 participant

Add GPT-5 to Code Eval #180

Are you sure you want to change the base?

Add GPT-5 to Code Eval #180

Uh oh!

Conversation

sitammeur commented Aug 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

coderabbitai bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

Labels

1 participant

sitammeur commented Aug 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 11, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)