Skip to content

Conversation

@stas00
Copy link
Collaborator

@stas00 stas00 commented Nov 3, 2025

Make it very clear that TiledMLP's memory saving has a cost of recomputing forward.

Make it very clear that `TiledMLP`'s memory saving has a cost of recomputing forward.
@stas00 stas00 enabled auto-merge (squash) November 3, 2025 16:53
@stas00 stas00 merged commit 76a4075 into master Nov 3, 2025
12 checks passed
@stas00 stas00 deleted the stas00-patch-1 branch November 3, 2025 18:47
@stas00
Copy link
Collaborator Author

stas00 commented Nov 3, 2025

Thank you, Masahiro

@kidlj
Copy link

kidlj commented Nov 4, 2025

I just noticed that there's a typo in this commit: occurs trice.

@stas00
Copy link
Collaborator Author

stas00 commented Nov 4, 2025

that's not a typo, it does occur trice.

  1. normal forward
  2. activation checkpointing forward
  3. backward's internal forward-like recomputation.

in a normal non-tiled computation module only 1+2 occur.

This is the price to save memory. So 25% more computation.

aeeeeeep pushed a commit to aeeeeeep/DeepSpeed that referenced this pull request Nov 13, 2025
Make it very clear that `TiledMLP`'s memory saving has a cost of
recomputing forward.

Signed-off-by: aeeeeeep <aeeeeeep@proton.me>
rraminen pushed a commit to rraminen/DeepSpeed that referenced this pull request Dec 1, 2025
Make it very clear that `TiledMLP`'s memory saving has a cost of
recomputing forward.

Signed-off-by: rraminen <rraminen@amd.com>
aeeeeeep pushed a commit to aeeeeeep/DeepSpeed that referenced this pull request Jan 15, 2026
Make it very clear that `TiledMLP`'s memory saving has a cost of
recomputing forward.

Signed-off-by: aeeeeeep <aeeeeeep@proton.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants