Skip to content

Fix memory compressor timeout fallback#498

Open
RitwijParmar wants to merge 1 commit into
usestrix:mainfrom
RitwijParmar:ritwij/memory-compressor-timeout-fallback
Open

Fix memory compressor timeout fallback#498
RitwijParmar wants to merge 1 commit into
usestrix:mainfrom
RitwijParmar:ritwij/memory-compressor-timeout-fallback

Conversation

@RitwijParmar

Copy link
Copy Markdown

Summary

  • disable LiteLLM retry amplification for memory-compressor summarization calls
  • add a local extractive fallback summary when compressor LLM calls fail or time out
  • preserve recent scan messages while avoiding long-scan stalls caused by repeated summarization failures

Why

This addresses the reliability failure mode in #470 where long-context scans can get stuck when the memory compressor repeatedly hits LiteLLM timeouts. The fallback keeps the scan moving while retaining ordered message previews and recent operational context.

Tests

  • PYTEST_ADDOPTS='' .venv/bin/python -m pytest -o addopts='' tests/llm/test_memory_compressor.py
@greptile-apps

greptile-apps Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a reliability failure in long-context scans where repeated LiteLLM summarization timeouts could stall the memory compressor. It adds num_retries=0 to prevent retry amplification and introduces _build_fallback_summary, an extractive local summarizer that preserves head/tail message previews when the LLM call fails.

  • num_retries=0 is passed to litellm.completion so a single timeout does not silently trigger multiple retries.
  • _build_fallback_summary returns a context_summary message with up to 12 sampled previews (head + tail) when the LLM raises any exception; the existing exception handler now delegates to it instead of returning messages[0].
  • The empty-response branch (if not summary.strip(): return messages[0]) was not updated to use the new fallback, leaving one failure mode — an LLM that responds with blank content — still returning a raw uncompressed old message.

Confidence Score: 3/5

The change is almost complete but one branch was not updated consistently with the rest.

The empty-summary guard at line 178 still returns messages[0] directly, which is exactly the raw-old-message return the PR is designed to eliminate. An LLM that responds with an empty string triggers this path and re-creates the context inflation the fix targets. The rest of the change — disabling retries and the fallback builder — is correct and well-tested.

strix/llm/memory_compressor.py line 178 needs the same _build_fallback_summary treatment applied to the exception handler above it.

Important Files Changed

Filename Overview
strix/llm/memory_compressor.py Adds num_retries=0 to suppress LiteLLM retry amplification and introduces _build_fallback_summary as an extractive local fallback on exception; the empty-LLM-response branch at line 178 still returns messages[0] (a raw old message) instead of the fallback, leaving one failure mode unaddressed.
tests/llm/test_memory_compressor.py New test file covering retry-disable, timeout fallback, and compress_history fallback path; the empty-response branch (not summary.strip()) is not exercised so the surviving messages[0] return goes undetected.

Comments Outside Diff (1)

  1. strix/llm/memory_compressor.py, line 177-178 (link)

    P1 When the LLM returns an empty or whitespace-only response, the code still returns messages[0] — a raw, potentially large old message — rather than the new fallback. This bypasses the core fix this PR introduces: the PR prevents raw-message returns on exceptions/timeouts but leaves this branch unaddressed. An LLM that responds with an empty string (a real failure mode) would still push an uncompressed old message back into the conversation, re-creating the stall.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: strix/llm/memory_compressor.py
    Line: 177-178
    
    Comment:
    When the LLM returns an empty or whitespace-only response, the code still returns `messages[0]` — a raw, potentially large old message — rather than the new fallback. This bypasses the core fix this PR introduces: the PR prevents raw-message returns on exceptions/timeouts but leaves this branch unaddressed. An LLM that responds with an empty string (a real failure mode) would still push an uncompressed old message back into the conversation, re-creating the stall.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
strix/llm/memory_compressor.py:177-178
When the LLM returns an empty or whitespace-only response, the code still returns `messages[0]` — a raw, potentially large old message — rather than the new fallback. This bypasses the core fix this PR introduces: the PR prevents raw-message returns on exceptions/timeouts but leaves this branch unaddressed. An LLM that responds with an empty string (a real failure mode) would still push an uncompressed old message back into the conversation, re-creating the stall.

```suggestion
        if not summary.strip():
            return _build_fallback_summary(messages)
```

Reviews (1): Last reviewed commit: "fix memory compressor timeout fallback" | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant