Deep debug #262

recursix · 2025-07-11T20:41:41Z

Description by Korbit AI

What change is being made?

Introduce a debug convenience script to launch experiments, update the goal.apply method to support custom system messages, enhance drawing capabilities with a dashed rectangle feature, and modify the agent tool selection logic in the response API.

Why are these changes being made?

These changes streamline the debugging and testing process by adding a script for experiment execution, improve model guidance flexibility with customizable messages, enhance visualization with dashed rectangles for better action overlays, and refine LLM response handling to facilitate tool choice adjustments based on API requirements.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

…b configuration

Refactor OpenAIResponseModel and related classes to remove tool_choice attribute and streamline API calls

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Basic root logger configuration ▹ view
	Agent Configuration Should Be Separated ▹ view
	Misleading error handling comment ▹ view
	Static Parallel Job Configuration ▹ view
	Debug Print Statement in Production ▹ view
	Incorrect Tool Choice Parameter Handling ▹ view
	Hardcoded System Message Strings ▹ view
	Missing explanation for removed NaN check ▹ view
	Invalid GPT Model Name ▹ view

Files scanned

File Path	Reviewed
main_workarena_debug.py	✅
src/agentlab/analyze/overlay_utils.py	✅
src/agentlab/agents/tool_use_agent/tool_use_agent.py	✅
src/agentlab/llm/response_api.py	✅
src/agentlab/analyze/agent_xray.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

korbit-ai · 2025-07-11T20:50:20Z

main_workarena_debug.py

+)
+from agentlab.experiments.study import Study
+
+logging.getLogger().setLevel(logging.INFO)


Basic root logger configuration

Tell me more

What is the issue?

Using root logger without a proper logger configuration (no handler, formatter or name)

Why this matters

Without proper logger configuration, logs may lack contextual information like timestamps and source, making debugging production issues difficult.

Suggested change ∙ Feature Preview

import logging logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', datefmt='%Y-%m-%d %H:%M:%S' ) logger = logging.getLogger(__name__)

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-07-11T20:50:20Z

main_workarena_debug.py

+agent_configs = [
+    ToolUseAgentArgs(
+        model_args=GPT_4_1,
+        config=config,
+    ),
+    # ToolUseAgentArgs(
+    #     model_args=GPT_4_1,
+    #     config=config,
+    # ),
+]
+
+for agent_config in agent_configs:
+    agent_config.config.action_subsets = ("workarena",)  # use the workarena action set


Agent Configuration Should Be Separated

Tell me more

What is the issue?

Agent configuration is mixed with experiment setup code, violating the Single Responsibility Principle.

Why this matters

Mixing configuration with setup code makes it harder to maintain different agent configurations and reduces code reusability across different experiments.

Suggested change ∙ Feature Preview

Extract agent configuration into a separate factory or builder class:

class AgentConfigFactory: @staticmethod def create_workarena_config(base_config): config = deepcopy(base_config) config.action_subsets = ("workarena",) return ToolUseAgentArgs(model_args=GPT_4_1, config=config) agent_configs = [AgentConfigFactory.create_workarena_config(DEFAULT_PROMPT_CONFIG)]

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-07-11T20:50:20Z

main_workarena_debug.py

+n_jobs = 10  # Make sure to use 1 job when debugging in VSCode
+parallel_backend = "ray"


Static Parallel Job Configuration

Tell me more

What is the issue?

Hard-coded number of parallel jobs without considering system resources (CPU cores, memory) could lead to suboptimal performance.

Why this matters

Setting a fixed number of jobs might either underutilize available resources or overload the system, causing overhead from context switching and memory pressure.

Suggested change ∙ Feature Preview

Use system information to determine optimal job count:

from multiprocessing import cpu_count n_jobs = min(cpu_count(), 10) # Cap at 10 but adjust based on available cores

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-07-11T20:50:20Z

src/agentlab/analyze/overlay_utils.py

+    if dashed:
+        # Draw dashed rectangle
+        print("Drawing dashed rectangle")
+        linedashed(draw, x, y, x + w, y, color, width)
+        linedashed(draw, x + w, y, x + w, y + h, color, width)
+        linedashed(draw, x + w, y + h, x, y + h, color, width)
+        linedashed(draw, x, y + h, x, y, color, width)


Debug Print Statement in Production

Tell me more

What is the issue?

A debug print statement is left in production code for the dashed rectangle drawing.

Why this matters

Debug print statements in production code can clutter logs and make it harder to identify actual issues when they occur.

Suggested change ∙ Feature Preview

Remove the print statement:

if dashed: linedashed(draw, x, y, x + w, y, color, width) linedashed(draw, x + w, y, x + w, y + h, color, width) linedashed(draw, x + w, y + h, x, y + h, color, width) linedashed(draw, x, y + h, x, y, color, width)

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-07-11T20:50:20Z

src/agentlab/llm/response_api.py

+        if tool_choice in ("any", "required"):
+            tool_choice = "required"
+
+        api_params["tool_choice"] = tool_choice


Incorrect Tool Choice Parameter Handling

Tell me more

What is the issue?

In OpenAIChatCompletionModel._call_api, tool_choice is set to 'required' when it's 'any', which conflicts with intended behavior and documentation for tool choice parameter.

Why this matters

OpenAI's API expects tool_choice to be either 'none', 'auto', or an object specifying 'type': 'function' with a required name. The current implementation may lead to unexpected API behavior.

Suggested change ∙ Feature Preview

Replace the tool_choice handling with proper OpenAI API expected format:

if tool_choice == "any": api_params["tool_choice"] = {"type": "function"} elif tool_choice == "required": api_params["tool_choice"] = "required" else: api_params["tool_choice"] = tool_choice

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-07-11T20:50:20Z

src/agentlab/agents/tool_use_agent/tool_use_agent.py

+                sys_msg = SYS_MSG + "\nYou can take multiple actions in a single step, if needed."
+            else:
+                sys_msg = SYS_MSG + "\nYou can only take one action at a time."


Hardcoded System Message Strings

Tell me more

What is the issue?

The system message construction contains duplicated code and hardcoded strings that should be constants.

Why this matters

String concatenation in multiple places makes the code harder to maintain and increases the risk of inconsistencies.

Suggested change ∙ Feature Preview

MULTI_ACTION_MSG = "\nYou can take multiple actions in a single step, if needed." SINGLE_ACTION_MSG = "\nYou can only take one action at a time." sys_msg = SYS_MSG + (MULTI_ACTION_MSG if self.config.multiaction else SINGLE_ACTION_MSG)

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-07-11T20:50:21Z

src/agentlab/analyze/agent_xray.py

                summary_df = pd.read_csv(most_recent_summary)

-                if len(summary_df) == 0 or summary_df["avg_reward"].isna().all():
+                if len(summary_df) == 0:


Missing explanation for removed NaN check

Tell me more

What is the issue?

The condition was simplified from 'len(summary_df) == 0 or summary_df["avg_reward"].isna().all()' to just 'len(summary_df) == 0' without documenting why the NaN check was removed.

Why this matters

Future maintainers won't understand why a seemingly important NaN check was removed, which could lead to reintroducing it unnecessarily.

Suggested change ∙ Feature Preview

if len(summary_df) == 0: # NaN check removed since empty dataframes are already filtered out

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-07-11T20:50:21Z

src/agentlab/analyze/agent_xray.py

+                if len(summary_df) == 0:
                    continue  # skip if all avg_reward are NaN


Misleading error handling comment

Tell me more

What is the issue?

The comment is misleading as it states 'skip if all avg_reward are NaN' but the code actually skips if the dataframe is empty.

Why this matters

Having misleading comments about error handling conditions can lead to confusion and make debugging more difficult when actual NaN values are present.

Suggested change ∙ Feature Preview

if len(summary_df) == 0: continue # skip if dataframe is empty

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-07-11T20:50:21Z

src/agentlab/agents/tool_use_agent/tool_use_agent.py

+GPT_4_1 = OpenAIResponseModelArgs(
    model_name="gpt-4.1",


Invalid GPT Model Name

Tell me more

What is the issue?

The code references a non-existent GPT model 'gpt-4.1' which will cause runtime errors when attempting to use the model.

Why this matters

Using an invalid model name will cause API calls to fail and prevent the agent from functioning.

Suggested change ∙ Feature Preview

Update to use a valid OpenAI model name like 'gpt-4-turbo-preview' or 'gpt-4':

GPT_4_1 = OpenAIResponseModelArgs( model_name="gpt-4-turbo-preview",

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

recursix added 8 commits July 10, 2025 13:47

script to debug workarena

d7259d2

Remove commented sequential backend option for clarity in parallel jo…

64f34f3

…b configuration

some cleanup

bacade9

multi_action_hint

16b5251

adding dash_line to action overlay

0fc1732

minor convenient updates

1f76e35

Fix openai returning no tool calls

c4d3c4a

Refactor OpenAIResponseModel and related classes to remove tool_choice attribute and streamline API calls

minor for loading weird studies

73c64ce

recursix requested a review from amanjaiswal73892 July 11, 2025 20:41

removing something that should not be ther

6a4c808

korbit-ai bot reviewed Jul 11, 2025

View reviewed changes

amanjaiswal73892 approved these changes Jul 11, 2025

View reviewed changes

amanjaiswal73892 merged commit d92e0bf into main Jul 11, 2025
6 checks passed

amanjaiswal73892 deleted the deep_debug branch July 11, 2025 20:58

amanjaiswal73892 restored the deep_debug branch July 12, 2025 00:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deep debug #262

Deep debug #262

Uh oh!

recursix commented Jul 11, 2025 •

edited by korbit-ai bot

Loading

korbit-ai bot left a comment •

edited

Loading

korbit-ai bot Jul 11, 2025

korbit-ai bot Jul 11, 2025

korbit-ai bot Jul 11, 2025

korbit-ai bot Jul 11, 2025

korbit-ai bot Jul 11, 2025

korbit-ai bot Jul 11, 2025

korbit-ai bot Jul 11, 2025

korbit-ai bot Jul 11, 2025

korbit-ai bot Jul 11, 2025

Uh oh!

Labels

3 participants

		n_jobs = 10 # Make sure to use 1 job when debugging in VSCode
		parallel_backend = "ray"

		if len(summary_df) == 0:
		continue # skip if all avg_reward are NaN

Deep debug #262

Deep debug #262

Uh oh!

Conversation

recursix commented Jul 11, 2025 • edited by korbit-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description by Korbit AI

What change is being made?

Why are these changes being made?

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

korbit-ai bot Jul 11, 2025

Choose a reason for hiding this comment

Basic root logger configuration

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

korbit-ai bot Jul 11, 2025

Choose a reason for hiding this comment

Agent Configuration Should Be Separated

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

korbit-ai bot Jul 11, 2025

Choose a reason for hiding this comment

Static Parallel Job Configuration

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

korbit-ai bot Jul 11, 2025

Choose a reason for hiding this comment

Debug Print Statement in Production

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

korbit-ai bot Jul 11, 2025

Choose a reason for hiding this comment

Incorrect Tool Choice Parameter Handling

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

korbit-ai bot Jul 11, 2025

Choose a reason for hiding this comment

Hardcoded System Message Strings

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

korbit-ai bot Jul 11, 2025

Choose a reason for hiding this comment

Missing explanation for removed NaN check

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

korbit-ai bot Jul 11, 2025

Choose a reason for hiding this comment

Misleading error handling comment

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

korbit-ai bot Jul 11, 2025

Choose a reason for hiding this comment

Invalid GPT Model Name

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

Labels

3 participants

recursix commented Jul 11, 2025 •

edited by korbit-ai bot

Loading

korbit-ai bot left a comment •

edited

Loading