Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

Fix Anthropic max_tokens issue causing slow execution (issue #3807)

Summary

This PR fixes issue #3807 where Anthropic models were always reaching the max_tokens limit, causing extremely slow execution times.

Root Cause: In v1.0+, max_tokens defaulted to 4096 and was always passed to the Anthropic API, causing the model to generate up to 4096 tokens even for simple queries that should only need a few tokens (e.g., "Answer in one sentence: What is CrewAI?").

The Fix:

  • Changed max_tokens parameter from int = 4096 to int | None = None (optional)
  • Added dynamic computation in _prepare_completion_params():
    • Default: 1024 tokens (4x smaller than before)
    • Large context models (200k+): 2048 tokens (2x smaller than before)
    • User-specified values are always respected unchanged
  • This aligns with v0.203.1 behavior where max_tokens was optional

Changes:

  • lib/crewai/src/crewai/llms/providers/anthropic/completion.py: Modified __init__ and _prepare_completion_params
  • lib/crewai/tests/llms/anthropic/test_anthropic.py: Added 5 comprehensive tests covering all max_tokens scenarios

Review & Testing Checklist for Human

IMPORTANT: All tests in this PR use mocks. Real API testing is critical to validate the fix actually works.

  • Test with real Anthropic API key: Create a simple crew/agent with Anthropic LLM and verify that:

    • Simple queries (like "Answer in one sentence: what is 2+2?") complete quickly
    • Check the Anthropic console to confirm output tokens are reasonable (not hitting 1024/2048 limit)
    • Compare execution time to v0.203.1 if possible
  • Verify dynamic max_tokens values are appropriate: The defaults (1024/2048) are somewhat arbitrary. Consider if these should be:

    • Smaller (e.g., 512/1024)?
    • Configurable via environment variable?
    • Based on input token count rather than just context window size?
  • Check backwards compatibility: Verify that users who explicitly set max_tokens (e.g., LLM(model="anthropic/...", max_tokens=8192)) still get their specified value passed through unchanged

  • Review the context window threshold: The 100k threshold for deciding between 1024 vs 2048 tokens is hardcoded. Verify this makes sense for the Claude model family.

Suggested Test Plan

# Test 1: Verify simple query doesn't hit max_tokens
llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
result = llm.call("Answer in one sentence: What is CrewAI?")
# Check Anthropic console: output tokens should be ~20-50, not 1024

# Test 2: Verify explicit max_tokens still works  
llm = LLM(model="anthropic/claude-3-5-sonnet-20241022", max_tokens=512)
result = llm.call("Write a long essay about AI")
# Check Anthropic console: should stop at 512 tokens

Notes

  • All existing tests pass (30/30 in test_anthropic.py)
  • Linter (ruff) passes with auto-fixes applied
  • Type checker (mypy) was attempted but hung during local testing - CI will validate

Session Info: Requested by João (joao@crewai.com)
Devin Session: https://app.devin.ai/sessions/4a9ba7830b2f45f9b6ce79a9ff43cf14

Make max_tokens optional and compute dynamically when not set by user.

Previously, max_tokens defaulted to 4096 and was always passed to the
Anthropic API, causing the model to generate up to 4096 tokens even for
simple queries that should only need a few tokens. This resulted in
extremely slow execution times.

Changes:
- Changed max_tokens parameter from int (default 4096) to int | None (default None)
- Added dynamic computation in _prepare_completion_params():
  - Default: 1024 tokens (much more reasonable for most queries)
  - Large context models (200k+): 2048 tokens
  - User-specified values are always respected
- Updated docstring to reflect that max_tokens is now optional
- Added comprehensive tests covering:
  - Explicit max_tokens values are passed through unchanged
  - Default behavior computes reasonable max_tokens dynamically
  - max_tokens=None uses dynamic computation
  - Dynamic values are appropriate for model context window size
  - User-provided values are always respected

This fix aligns with the v0.203.1 behavior where max_tokens was optional
and only passed when explicitly set, while maintaining compatibility with
the Anthropic SDK requirement that max_tokens must be provided.

Fixes #3807

Co-Authored-By: João <joao@crewai.com>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant