Skip to content

Support large target repos with with bind-mount option.#577

Merged
bearsyankees merged 10 commits into
usestrix:mainfrom
mhspektr:feature/492-support-large-repos
Jun 22, 2026
Merged

Support large target repos with with bind-mount option.#577
bearsyankees merged 10 commits into
usestrix:mainfrom
mhspektr:feature/492-support-large-repos

Conversation

@mhspektr

Copy link
Copy Markdown
Contributor

Local --target dirs are copied into the sandbox file-by-file, which stalls on large repos and can leave /workspace empty.

What's new:

  • --mount <path> bind-mounts a local dir into the sandbox read-only instead of copying it — ideal for big monorepos.
  • A size pre-flight: oversized non-mounted targets (STRIX_MAX_LOCAL_COPY_MB, default 1024) exit early and suggest --mount.

Tests and docs included. Fixes #492.

mhspektr added 8 commits June 18, 2026 22:45
- Change RuntimeError to TypeError for type validation in report/writer.py
- Update pyupgrade to v3.21.2 for Python 3.14 compatibility
Mirror the layout introduced on feature/438-token_budget: pytest +
pytest-asyncio dev deps, asyncio_mode auto, a tests.* mypy override, and
pytest in the mypy pre-commit hook deps so the tests/ package type-checks.
…ix#492)

Large local targets were copied into the sandbox file-by-file via the SDK
LocalDir entry, which stalls on big repos and could leave /workspace empty.

- --mount <path> bind-mounts a host directory read-only at /workspace/<subdir>
  instead of copying it, bypassing the per-file stream.
- A size pre-flight (STRIX_MAX_LOCAL_COPY_MB, default 1024) fails fast with a
  clear message suggesting --mount when a non-mounted local target is too big.
An empty or whitespace-only --mount value resolves to the current working
directory and would silently bind-mount it into the sandbox. Reject it.
If the same directory is passed via --target and --mount (or as duplicate
values), it previously produced two targets — copied AND bind-mounted, and
the copied one could trip the size pre-flight. Dedupe by resolved path,
preferring the bind mount.
Previously a value of 0 (or negative) made every local target count as
oversized, aborting all local scans. Now <= 0 disables the pre-flight.
os.walk silently swallowed directory-listing errors, so a permission-denied
subtree could make a large repo under-count and slip past the pre-flight.
Surface such omissions via an onerror warning.
Add CLI reference + example for --mount, document the size pre-flight env var,
note the read-only-is-not-a-hard-boundary caveat and that remote repos are not
size-checked, and clarify the backends docstring on when bind mounts apply.
@greptile-apps

greptile-apps Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a --mount option that bind-mounts a local directory into the sandbox read-only instead of streaming it file-by-file, and a size pre-flight check (STRIX_MAX_LOCAL_COPY_MB, default 1024 MB) that rejects oversized non-mounted targets before the copy begins.

  • build_mount_targets_info / dedupe_local_targets / find_oversized_local_targets in utils.py handle CLI parsing, deduplication (mount wins over copy for the same path), and size enforcement.
  • build_session_entries in session_manager.py splits sources into SDK LocalDir manifest entries (copied) and host bind-mount specs passed to the Docker backend at container-create time via DockerSDKMount.
  • Test coverage is solid across all new helpers.

Confidence Score: 5/5

Safe to merge. The bind-mount path is well-isolated and the size pre-flight has no effect on correctness — only on UX for large trees.

The core logic (manifest split, Docker mount injection, deduplication, path resolution) is correct and comprehensively tested. The one improvement opportunity — adding an early-exit to the directory-size walk — is a performance nicety, not a correctness issue.

strix/interface/utils.py (directory_size_bytes early-exit suggestion)

Important Files Changed

Filename Overview
strix/interface/utils.py Adds five new helpers: directory_size_bytes, find_oversized_local_targets, build_mount_targets_info, dedupe_local_targets, and extends collect_local_sources with mount flag. Logic is correct; directory_size_bytes has no early-exit once the limit is exceeded.
strix/interface/main.py Adds --mount CLI argument, wires it into argument validation (resume guard, required-args check), runs dedup then size pre-flight before parsing completes. Flow and error handling are correct.
strix/runtime/session_manager.py Extracts build_session_entries to split local sources into copied manifest entries and bind mounts; passes bind_mounts to the backend. Logic is clean and well-tested.
strix/runtime/docker_client.py Adds strix_bind_mounts class attribute and appends DockerSDKMount objects to create_kwargs before container creation. Correctly uses setdefault to coexist with manifest-declared mounts.
strix/runtime/backends.py Adds bind_mounts parameter to _docker_backend and assigns it to client.strix_bind_mounts before create(). Clean and minimal change.
tests/test_local_sources.py Comprehensive unit tests for directory_size_bytes, find_oversized_local_targets, build_mount_targets_info, dedupe_local_targets, and collect_local_sources. Good edge-case coverage including POSIX permissions and empty-path rejection.
tests/test_session_entries.py Tests build_session_entries split logic for copied vs mounted sources; covers mixed, incomplete, and pure-mount inputs.
strix/config/settings.py Adds max_local_copy_mb field to RuntimeSettings with sensible 1024 MB default and STRIX_MAX_LOCAL_COPY_MB alias.
strix/core/inputs.py Appends ", read-only mount" suffix to the local-codebase task description when the target is bind-mounted; minor and correct.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
strix/interface/utils.py:1220-1245
`directory_size_bytes` walks the entire tree even when the cumulative size already exceeds the caller's limit. For a 100 GB repository with the default 1 GB cap, the CLI will stat every file before `find_oversized_local_targets` can reject it. Adding a `limit` parameter with an early-return makes the worst-case proportional to the limit, not the total size.

```suggestion
def directory_size_bytes(path: Path, limit: int = 0) -> int:
    """Total size in bytes of regular files under ``path`` (symlinks not followed).

    Best-effort: files that disappear or can't be stat'd mid-walk are skipped.
    Used as a cheap (stat-only) pre-flight to estimate the cost of streaming a
    local target into the sandbox before we actually try to copy it.

    Directories that can't be listed (e.g. permission denied) are logged and
    skipped rather than silently dropped — so an under-count is at least
    visible — but the returned total then excludes their contents.

    When ``limit`` is positive the walk stops as soon as the running total
    exceeds it — the returned value is then an over-estimate, but that is fine
    for a "is this too big?" pre-flight check.
    """

    def _on_walk_error(error: OSError) -> None:
        logger.warning("Could not read %s while measuring size: %s", error.filename, error)

    total = 0
    for root, _dirs, files in os.walk(path, followlinks=False, onerror=_on_walk_error):
        for name in files:
            file_path = os.path.join(root, name)  # noqa: PTH118
            try:
                if os.path.islink(file_path):  # noqa: PTH114
                    continue
                total += os.path.getsize(file_path)  # noqa: PTH202
            except OSError:
                continue
        if limit > 0 and total > limit:
            return total
    return total
```

### Issue 2 of 2
strix/interface/utils.py:1269-1270
Pass `max_bytes` as the early-exit `limit` so the walk stops as soon as it is clear the target is oversized, rather than always traversing the full tree.

```suggestion
        size = directory_size_bytes(Path(target_path), limit=max_bytes)
        if size > max_bytes:
```

Reviews (2): Last reviewed commit: "Update strix/runtime/docker_client.py" | Re-trigger Greptile

Comment thread strix/interface/main.py Outdated
Comment thread strix/runtime/docker_client.py Outdated
mhspektr and others added 2 commits June 19, 2026 07:34
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@bearsyankees

Copy link
Copy Markdown
Collaborator
@bearsyankees

Copy link
Copy Markdown
Collaborator

LGTM!

@bearsyankees bearsyankees merged commit 7141ccf into usestrix:main Jun 22, 2026
1 check passed
@mhspektr mhspektr deleted the feature/492-support-large-repos branch June 24, 2026 05:23
@mhspektr

Copy link
Copy Markdown
Contributor Author

@bearsyankees how do you feel about #583 ? :)

@bearsyankees

Copy link
Copy Markdown
Collaborator

great @0xallam reviewing now , apologies we have quite a big backlog of stuff we are working through @mhspektr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants