How do I troubleshoot latency and optimize Amazon Bedrock Agents performance?

3 minute read

I want to troubleshoot the high Amazon Bedrock Agents latency and optimize its performance.

Short description

Amazon Bedrock Agents use reasoning capabilities to retrieve information from knowledge bases that might result in high latency. The following factors affect Amazon Bedrock Agents response times:

Model size
Prompt structure and complexity
Number of input and output tokens
Network connectivity and AWS Regional infrastructure

Resolution

Troubleshoot the agent latency

Update your model size

If you use large foundation models and experience higher latency, then use lighter models for latency-sensitive use cases. Invocation latency scales with the combined count of input and output tokens. To reduce the size of the output tokens, you can provide instructions to your agent to provide clear and short responses to user queries.

Note: Amazon Bedrock automatically publishes the Invocation Latency metric under Bedrock in Amazon CloudWatch. For more information, see Monitoring the performance of Amazon Bedrock.

Review your orchestration strategies

As the number of model invocations increases, the latency also increases. Make sure that you efficiently define your agent's instructions and your functions and parameters. If you receive slow responses and your use cases don't require orchestration features, then turn off orchestration to reduce latency. Features such as pre-processing, orchestration, and post-processing might introduce additional processing time.

To turn off the orchestration templates, complete the following steps:

Open the Amazon Bedrock console in the AWS Region where your agent is deployed.
In the navigation pane, expand Builder tools and choose Agents.
Select the agent, and then choose Edit in Agent Builder.
In the Orchestration strategy details section, choose Edit.
Choose the Pre-processing tab and turn off Activate pre-processing template.
Choose the Post-processing tab and turn off Activate post-processing template.
Choose Save and exit.

If you use custom orchestration, then configure the orchestration to optimize performance. Provide clear instructions. If you include redundant or ambiguous instructions, then you might increase the model's cognitive load and affect response time.

Check Amazon Bedrock network configuration

If you use AWS Lambda with a virtual private cloud (VPC) and experience slow network interactions with Amazon Bedrock, then traffic might route through the public internet. To resolve this issue, use AWS PrivateLink to set up private access to Amazon Bedrock.

Activate CRIS

If you experience latency during periods of high Regional demand, then you might hit a Regional bottleneck. To resolve this issue, increase throughput with cross-Region inference (CRIS) to distribute inference workloads across multiple Regions.

Optimize the agent performance

Activate the streaming responses

If your outputs have high token counts, then users must wait for the full response. To check if you have high token counts, use the OutputTokenCount metric in CloudWatch. To activate streaming responses, use the InvokeModelWithResponseStream API so content arrives as it generates.

Configure your knowledge base

If you don't limit the number of document chunks returned, then the response generation time increases. To resolve this issue, use the numberOfResults parameter to limit the number of document chunks retrieved from your knowledge base.

Related information

Monitor model invocation using CloudWatch Logs and Amazon S3

Topics: Machine Learning & AI Generative AI on AWS
Tags: Amazon Bedrock
Language: English

AWS OFFICIALUpdated 2 days ago

No comments

Relevant content

Performance Optimization Help Needed: Multi-Agent Text Analysis System in AWS Bedrock
PeterG
asked 8 months ago
Bedrock Agent Streaming API delays response and serializes parallel requests
Eskivel
asked 3 months ago
Issue streaming response from bedrock agent
Evan
asked a year ago
How do I get around the limit of 5 APIs for a Bedrock Agent action group.
Dyl
asked a year ago
Regarding Knowledge Bases and Lambda in Bedrock Agents
Khushboo
asked 4 months ago
How do I improve Amazon Bedrock performance and response times when I process and retrieve large scale data?
AWS OFFICIALUpdated 2 months ago
How do I optimize prompts to achieve deterministic responses on Amazon Bedrock?
AWS OFFICIALUpdated a year ago
How do I optimize batch inference jobs in Amazon Bedrock?
AWS OFFICIALUpdated 2 months ago
How do I troubleshoot high latency with my Amazon SageMaker endpoint?
AWS OFFICIALUpdated 8 months ago
Building Enterprise-Grade Generative AI Applications: The Complete Production Readiness Guide
EXPERT
Harish Mandhadi
published 22 days ago