-
Notifications
You must be signed in to change notification settings - Fork 82
Added Conversation Compression System #1343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
tim-inkeep
commented
Dec 19, 2025
- New compression services: Add ConversationCompressor and update conversation history handling to use getConversationHistoryWithCompression()
- Code organization: Extract generation logic into helper methods (setupGenerationContext, loadToolsAndPrompts, buildConversationHistory, etc.) to reduce complexity in main generateResponse method
- JSON post-processing: Add withJsonPostProcessing utility for better response handling
🦋 Changeset detectedLatest commit: 5fda026 The changes in this PR will be included in the next version bump. This PR includes changesets to release 9 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Claude finished @tim-inkeep's task —— View job Code Review: Conversation Compression SystemHaving conducted a comprehensive analysis of this conversation compression system PR, I'll provide feedback focusing on the most critical architectural and engineering considerations. Review Checklist:
Architecture & System Design Review✅ Strong Architectural Decisions1. Inheritance-Based Compression Hierarchy 2. Clean Integration with Existing Message Flow 🚨 Critical Areas for Improvement1. Memory Management & Potential Memory Leaks // BaseCompressor.ts:36-37
protected processedToolCalls = new Set<string>();
protected cumulativeSummary: ConversationSummary | null = null;Issue: These class-level sets and objects will grow indefinitely during long-running conversations. There's no cleanup mechanism. Recommendation: Implement periodic cleanup based on message age or implement a TTL mechanism: // Add to BaseCompressor
private cleanup() {
// Clean tool calls older than X hours
const cutoff = Date.now() - (6 * 60 * 60 * 1000); // 6 hours
this.processedToolCalls = new Set(Array.from(this.processedToolCalls).slice(-100));
}2. Race Conditions in Compression Logic // conversations.ts:500-506
const newMessagesCompressed = await compressConversationIfNeeded(messagesAfterCompression, {
conversationId, tenantId, projectId, summarizerModel, streamRequestId,
});Issue: Multiple concurrent requests could trigger compression simultaneously for the same conversation, potentially creating duplicate compression summaries. Recommendation: Implement conversation-level locking or atomic compression checks using database constraints. 3. Token Estimation Accuracy // BaseCompressor.ts:58-62
protected estimateTokens(content: any): number {
const text = typeof content === 'string' ? content : JSON.stringify(content);
return Math.ceil(text.length / 4);
}Issue: The 4:1 character-to-token ratio is a rough approximation that can be significantly inaccurate, especially for code, JSON, or multilingual content. This could lead to under/over-compression. Recommendation: Use the actual model's tokenizer when available, or implement model-specific estimation: // Consider using tiktoken for OpenAI models or similar libraries for other providers
protected estimateTokens(content: any, model?: string): number {
// Use actual tokenizer when available
if (this.baseModel?.provider === 'openai' && tiktoken) {
return tiktoken.encode(text).length;
}
// Fallback with model-specific ratios
return this.getModelSpecificEstimate(text, model);
}4. Error Boundary & Fallback Handling Recommendation: Add fallback mechanisms: async compress(messages: any[]): Promise<CompressionResult> {
try {
return await this.performCompression(messages);
} catch (error) {
logger.error({ error }, 'Compression failed, falling back to simple truncation');
return this.fallbackToSimpleCompression(messages);
}
}🏗️ Code Organization & Maintainability1. Agent.ts Refactoring Quality Recommendation: Consider further extraction: // Extract phase execution into separate methods
private async executePhase1(config: Phase1Config): Promise<GenerationResult>
private async executePhase2(config: Phase2Config): Promise<StructuredResult>2. Tool Result Artifact Creation Logic // BaseCompressor.ts:112-325 (saveToolResultsAsArtifacts)This method is doing too many things: format conversion, filtering, deduplication, and artifact creation. It's 200+ lines and violates single responsibility principle. Recommendation: Break into smaller, focused methods: private async processMessage(message: any): Promise<ToolCallArtifact[]>
private async createArtifactForToolCall(toolCall: ToolCall): Promise<string>
private shouldSkipToolCall(toolName: string): boolean🔒 Security & Data Handling1. Tool Result Content Sanitization // BaseCompressor.ts:436-456 (removeStructureHints)While structure hints are removed, there's no validation for potentially sensitive content in tool results before compression. Recommendation: Add content sanitization before compression: private sanitizeToolResult(result: any): any {
// Remove potential secrets, API keys, etc.
return this.recursivelyCleanSensitiveData(result);
}2. Artifact Metadata Consistency ⚡ Performance Considerations1. N+1 Query Pattern // BaseCompressor.ts:195-199
const existingArtifacts = await getLedgerArtifacts(dbClient)({
scopes: { tenantId: this.tenantId, projectId: this.projectId },
toolCallId: block.toolCallId,
});This creates individual database queries for each tool call, leading to N+1 performance issues. Recommendation: Batch the artifact lookups: // Collect all toolCallIds first, then query in batch
const toolCallIds = extractToolCallIds(messages);
const existingArtifacts = await getLedgerArtifactsBatch(dbClient)({
scopes: { tenantId: this.tenantId, projectId: this.projectId },
toolCallIds,
});2. Context Size Calculation Efficiency 📋 Testing & Documentation Gaps1. Missing Edge Case Tests
2. Documentation for Model-Specific Behavior 🔄 API Contracts & Backwards Compatibility1. Message Format Evolution Recommendation: Add database migration for existing conversations: -- Add migration to handle existing conversations without compression metadata
ALTER TABLE conversations ADD COLUMN compression_version INTEGER DEFAULT 1;🎯 Specific Recommendations for Implementation
📊 Overall AssessmentThis is a well-architected feature that addresses a real need for conversation context management. The inheritance hierarchy and integration approach are sound. However, the implementation has several production-readiness concerns around memory management, race conditions, and error handling that should be addressed before merge. The refactoring of Agent.ts improves maintainability, and the compression system provides a solid foundation for scaling to longer conversations. Severity Level: Medium-High - The core functionality is solid, but the production reliability concerns (memory leaks, race conditions, error handling) need to be addressed to ensure system stability under load. |
@claude Can you dive into these considerations more? are you sure our clean up procedure and our larger get endpoints dont handle this better? |
|
Claude finished @tim-inkeep's task in 2m 11s —— View job Deep Dive: Remaining ConsiderationsAfter examining the code more carefully, I'm happy to report that several of my initial concerns have been addressed better than I initially realized: Review Tasks:
✅ Good News: Several Concerns Are Already Addressed!1. ✅ Memory Management IS Properly HandledI found comprehensive cleanup procedures in // Three levels of cleanup are available:
cleanup(options: { resetSummary?: boolean; keepRecentToolCalls?: number })
partialCleanup() // Keeps last 50 tool calls
fullCleanup() // Complete reset including summary
// AgentSession also properly cleans up:
- Clears all Maps (pendingArtifacts, artifactProcessingErrors, artifactCache)
- Cancels timers (statusUpdateTimer, scheduledTimeouts)
- Calls static ArtifactService.clearCaches()The session cleanup in 2. ✅ N+1 Query Pattern IS SOLVEDI was wrong about this! The implementation already uses batch queries excellently: // BaseCompressor.ts:123-125
const toolCallIds = this.extractToolCallIds(messagesToProcess);
const existingArtifacts = await this.batchFindExistingArtifacts(toolCallIds);
// ledgerArtifacts.ts:328 - Native batch support
getLedgerArtifacts({ toolCallIds: string[] }) // Uses inArray for batch queryThe 3.
|
| | Model | Context Window | Conversation Threshold | Sub-Agent Generation Threshold | | ||
| |-------|----------------|------------------------|--------------------------------| | ||
| | GPT-5.1 | 400K tokens | 200K (50%) | ~332K (83%) | | ||
| | Claude-4.5 Sonnet | 200K tokens | 100K (50%) | ~166K (83%) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We update models so frequently. Can you drive this chart from our agents-core package definitions of the models and their properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want the docs to become out of date almost immediately
| - **Action**: Compacts entire conversation history into summary + artifacts | ||
| - **Use Case**: Long conversations with extensive history | ||
|
|
||
| #### Sub-Agent Generation Compacting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me what the difference is between these compacting types. Can you provide examples in the docs to illustrate what the difference is between these?