Fix Anthropic max_tokens issue causing slow execution (issue #3807) #3808

devin-ai-integration · 2025-10-29T00:28:08Z

Fix Anthropic max_tokens issue causing slow execution (issue #3807)

Summary

This PR fixes issue #3807 where Anthropic models were always reaching the max_tokens limit, causing extremely slow execution times.

Root Cause: In v1.0+, max_tokens defaulted to 4096 and was always passed to the Anthropic API, causing the model to generate up to 4096 tokens even for simple queries that should only need a few tokens (e.g., "Answer in one sentence: What is CrewAI?").

The Fix:

Changed max_tokens parameter from int = 4096 to int | None = None (optional)
Added dynamic computation in _prepare_completion_params():
- Default: 1024 tokens (4x smaller than before)
- Large context models (200k+): 2048 tokens (2x smaller than before)
- User-specified values are always respected unchanged
This aligns with v0.203.1 behavior where max_tokens was optional

Changes:

lib/crewai/src/crewai/llms/providers/anthropic/completion.py: Modified __init__ and _prepare_completion_params
lib/crewai/tests/llms/anthropic/test_anthropic.py: Added 5 comprehensive tests covering all max_tokens scenarios

Review & Testing Checklist for Human

IMPORTANT: All tests in this PR use mocks. Real API testing is critical to validate the fix actually works.

Test with real Anthropic API key: Create a simple crew/agent with Anthropic LLM and verify that:
- Simple queries (like "Answer in one sentence: what is 2+2?") complete quickly
- Check the Anthropic console to confirm output tokens are reasonable (not hitting 1024/2048 limit)
- Compare execution time to v0.203.1 if possible
Verify dynamic max_tokens values are appropriate: The defaults (1024/2048) are somewhat arbitrary. Consider if these should be:
- Smaller (e.g., 512/1024)?
- Configurable via environment variable?
- Based on input token count rather than just context window size?
Check backwards compatibility: Verify that users who explicitly set max_tokens (e.g., LLM(model="anthropic/...", max_tokens=8192)) still get their specified value passed through unchanged
Review the context window threshold: The 100k threshold for deciding between 1024 vs 2048 tokens is hardcoded. Verify this makes sense for the Claude model family.

Suggested Test Plan

# Test 1: Verify simple query doesn't hit max_tokens
llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
result = llm.call("Answer in one sentence: What is CrewAI?")
# Check Anthropic console: output tokens should be ~20-50, not 1024

# Test 2: Verify explicit max_tokens still works  
llm = LLM(model="anthropic/claude-3-5-sonnet-20241022", max_tokens=512)
result = llm.call("Write a long essay about AI")
# Check Anthropic console: should stop at 512 tokens

Notes

All existing tests pass (30/30 in test_anthropic.py)
Linter (ruff) passes with auto-fixes applied
Type checker (mypy) was attempted but hung during local testing - CI will validate

Session Info: Requested by João (joao@crewai.com)
Devin Session: https://app.devin.ai/sessions/4a9ba7830b2f45f9b6ce79a9ff43cf14

Make max_tokens optional and compute dynamically when not set by user. Previously, max_tokens defaulted to 4096 and was always passed to the Anthropic API, causing the model to generate up to 4096 tokens even for simple queries that should only need a few tokens. This resulted in extremely slow execution times. Changes: - Changed max_tokens parameter from int (default 4096) to int | None (default None) - Added dynamic computation in _prepare_completion_params(): - Default: 1024 tokens (much more reasonable for most queries) - Large context models (200k+): 2048 tokens - User-specified values are always respected - Updated docstring to reflect that max_tokens is now optional - Added comprehensive tests covering: - Explicit max_tokens values are passed through unchanged - Default behavior computes reasonable max_tokens dynamically - max_tokens=None uses dynamic computation - Dynamic values are appropriate for model context window size - User-provided values are always respected This fix aligns with the v0.203.1 behavior where max_tokens was optional and only passed when explicitly set, while maintaining compatibility with the Anthropic SDK requirement that max_tokens must be provided. Fixes #3807 Co-Authored-By: João <joao@crewai.com>

devin-ai-integration · 2025-10-29T00:28:11Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Anthropic max_tokens issue causing slow execution (issue #3807) #3808

Fix Anthropic max_tokens issue causing slow execution (issue #3807) #3808

devin-ai-integration bot commented Oct 29, 2025

devin-ai-integration bot commented Oct 29, 2025

Labels

1 participant

Fix Anthropic max_tokens issue causing slow execution (issue #3807) #3808

Are you sure you want to change the base?

Fix Anthropic max_tokens issue causing slow execution (issue #3807) #3808

Conversation

devin-ai-integration bot commented Oct 29, 2025

Fix Anthropic max_tokens issue causing slow execution (issue #3807)

Summary

Review & Testing Checklist for Human

Suggested Test Plan

Notes

devin-ai-integration bot commented Oct 29, 2025

🤖 Devin AI Engineer

Labels

1 participant