Skip to content

Conversation

@imenelydiaker
Copy link
Collaborator

@imenelydiaker imenelydiaker commented Oct 13, 2025

When deploying a VLLM server on a different Node/GPU than the one we are using when running agents, the base URL for the endpoint cannot be http://0.0.0.0:8000.

class VLLMChatModel(ChatModel):
def __init__(
self,
model_name,
api_key=None,
temperature=0.5,
max_tokens=100,
n_retry_server=4,
min_retry_wait_time=60,
):
super().__init__(
model_name=model_name,
api_key=api_key,
temperature=temperature,
max_tokens=max_tokens,
max_retry=n_retry_server,
min_retry_wait_time=min_retry_wait_time,
api_key_env_var="VLLM_API_KEY",
client_class=OpenAI,
client_args={"base_url": "http://0.0.0.0:8000/v1"},

This PR introduces a new environment variable VLLM_API_URL that allows to use a custom endpoint URL or fall back to the default local server.

Description by Korbit AI

What change is being made?

Allow configuring the VLLM endpoint URL via environment variable VLLM_API_URL, defaulting to http://localhost:8000/v1 if not set.

Why are these changes being made?

To enable configuring the VLLM backend URL without code changes, using a sensible default when the variable is not provided.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Status
Performance Repeated environment variable lookup on model instantiation ▹ view
Files scanned
File Path Reviewed
src/agentlab/llm/chat_api.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

api_key_env_var="VLLM_API_KEY",
client_class=OpenAI,
client_args={"base_url": "http://0.0.0.0:8000/v1"},
client_args={"base_url": os.getenv("VLLM_API_URL", "http://localhost:8000/v1")},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated environment variable lookup on model instantiation category Performance

Tell me more
What is the issue?

The os.getenv() call is executed on every VLLMChatModel instantiation, performing an unnecessary environment variable lookup each time.

Why this matters

This creates redundant system calls when multiple VLLMChatModel instances are created, as the environment variable is unlikely to change during program execution. The overhead becomes more significant in scenarios with frequent model instantiation.

Suggested change ∙ Feature Preview

Cache the environment variable lookup at module level or class level to avoid repeated os.getenv() calls:

# At module level
VLLM_BASE_URL = os.getenv("VLLM_API_URL", "http://localhost:8000/v1")

# Then in __init__:
client_args={"base_url": VLLM_BASE_URL}
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

@imenelydiaker imenelydiaker deleted the vllm-config branch October 14, 2025 01:03
@imenelydiaker imenelydiaker restored the vllm-config branch October 14, 2025 01:03
@imenelydiaker imenelydiaker reopened this Oct 14, 2025
Copy link
Collaborator

@amanjaiswal73892 amanjaiswal73892 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amanjaiswal73892 amanjaiswal73892 merged commit 9b6a33f into ServiceNow:main Oct 14, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants