Skip to content

Emit line numbers in grep/sgrep output#3

Merged
Prasanna721 merged 6 commits into
mainfrom
grep-line-numbers
Apr 28, 2026
Merged

Emit line numbers in grep/sgrep output#3
Prasanna721 merged 6 commits into
mainfrom
grep-line-numbers

Conversation

@Dhravya

@Dhravya Dhravya commented Apr 27, 2026

Copy link
Copy Markdown
Member

Summary

  • Resolves each matched chunk back to its source file (or transcription sidecar) and prints the matching line range in grep output, e.g. /notes.md:12-15:chunk text
  • Falls back to whitespace-normalised matching when verbatim search misses
  • Preserves the old file:content format when no local mount is available

Test plan

  • Run sgrep against a mounted container and verify line numbers appear
  • Confirm PDF/image transcription sidecars resolve correctly
  • Verify fallback to no-line-number format when mount is absent

🤖 Generated with Claude Code

Dhravya and others added 2 commits April 26, 2026 22:47
Resolve each matched chunk back to its source file (or transcription
sidecar) and print the matching line range, e.g.
`/notes.md:12-15:chunk text`.  Falls back to whitespace-normalised
matching when the verbatim search misses, keeping the old
`file:content` format when no local file is available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@vorflux vorflux Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed — found 2 issues with the line number emission logic.


Review with Vorflux

Comment thread crates/smfs/src/cmd/grep.rs Outdated
Comment thread crates/smfs/src/cmd/grep.rs Outdated
- Walker now skips through whole whitespace runs and lands on the next
  non-whitespace char that maps to the normalized position. Previous
  version stopped 1-2 chars short on chunks crossing blank lines or
  with leading whitespace.
- Verbatim end-line uses the last char's start byte (avoids slicing
  mid-multibyte UTF-8 char and panicking on chunks ending in non-ASCII).
- Per-loop file cache so each source file is read once across all hits.
- Extracted line_range_in_file/read_local_or_sidecar as pure helpers
  with unit tests covering the walker's edge cases.
@Prasanna721 Prasanna721 merged commit 8affe69 into main Apr 28, 2026
4 checks passed
@Prasanna721 Prasanna721 deleted the grep-line-numbers branch May 7, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants