From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: "Toon Claes" <toon@iotcl.com>,
git@vger.kernel.org, "Patrick Steinhardt" <ps@pks.im>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH 0/8] Introduce git-blame-tree(1) command
Date: Thu, 27 Mar 2025 17:58:19 -0400 [thread overview]
Message-ID: <Z+XJ+1L3PnC9Dyba@nand.local> (raw)
In-Reply-To: <20250327063243.GB3042475@coredump.intra.peff.net>
On Thu, Mar 27, 2025 at 02:32:43AM -0400, Jeff King wrote:
> The pathspec-trie stuff is, I think, still a reasonable idea for general
> use. But IIRC, the rewritten blame-tree you guys worked on does not
> benefit from it, because it ditches pathspecs entirely (both because
> they're too slow without the tries, but also because it's important to
> continually narrow the pathspec while traversing). That trie code was
> never run in production, I think (and I see there is a patch to narrow
> the pathspec while traversing; I suspect that likewise was never used).
Yeah, the rewritten blame-tree code uses changed-path Bloom filters to
narrow the set of revisions that we need to actually compute tree-diffs
for.
The general idea is that we have a set of paths that we have yet to
blame, and those are the "interesting" ones. IOW, if a changed-path
Bloom filter tells us that we are at some revision where there is maybe
a change to one or more unblamed paths, we need to compute a tree-diff.
But if the Bloom filter says "no", then we can skip the tree-diff at
that layer entirely.
> The max-depth diff code is also in theory a reasonable thing to have in
> general. But it is awkward to use, and not really necessary for
> blame-tree. There we really only care about recursing vs not recursing,
> but the usual "recursive" flag for diffing isn't enough (we have to
> recurse down to the tree of interest, but may not want to go further). I
> don't remember how that is handled in your blame-tree rewrites.
I think that's mostly true, but the blame-tree caching stuff that Stolee
worked on many years ago and mentioned below does require it IIRC.
> So yeah. I don't know if all of this is really a very good starting
> point. Taylor, if you can share the current code that GitHub is running,
> I think that would be beneficial for the community.
Sure. You can fetch from the 'tb/blame-tree' branch from my tree (which
is located at 'git@github.com:ttaylorr/git.git'). I owe a huge "thank
you" to Victoria Dye, who split out the various topics from GitHub's
fork into individual rebased branches.
There were many more patches on top that came after Victoria's split
above, and I applied those manually. The commit structure probably needs
significant clean-up and polishing before it's ready for serious review,
since this is more-or-less a raw dump of the work on GitHub's side over
more than a decade.
It also doesn't pass the tests in t9932 (and the test number should
probably also be reworked, it's in the t99xx range so that inclusion in
GitHub's fork doesn't cause collisions with new tests when we merge
upstream). To that end, I removed everybody's Signed-off-by in case I
have mangled their work in some way unintentionally.
Thanks,
Taylor
next prev parent reply other threads:[~2025-03-27 21:58 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-26 20:18 [PATCH 0/8] Introduce git-blame-tree(1) command Toon Claes
2025-03-26 20:18 ` [PATCH 1/8] combine-diff: zero memory used for callback filepairs Toon Claes
2025-03-26 20:18 ` [PATCH 2/8] within_depth: fix return for empty path Toon Claes
2025-03-26 20:18 ` [PATCH 3/8] diff: teach tree-diff a max-depth parameter Toon Claes
2025-03-26 20:18 ` [PATCH 4/8] blame-tree: introduce new subcommand to blame files Toon Claes
2025-03-26 20:18 ` [PATCH 5/8] pathspec: add optional trie index Toon Claes
2025-03-26 20:18 ` [PATCH 6/8] pathspec: turn on tries when appropriate Toon Claes
2025-03-26 20:18 ` [PATCH 7/8] tree-diff: use pathspec tries Toon Claes
2025-03-26 20:18 ` [PATCH 8/8] blame-tree: narrow pathspec as we traverse Toon Claes
2025-03-26 20:38 ` [PATCH 0/8] Introduce git-blame-tree(1) command Taylor Blau
2025-03-27 6:32 ` Jeff King
2025-03-27 21:58 ` Taylor Blau [this message]
2025-03-28 13:27 ` Toon Claes
2025-04-01 9:29 ` Jeff King
2025-03-27 12:04 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z+XJ+1L3PnC9Dyba@nand.local \
--to=me@ttaylorr.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=ps@pks.im \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).