Skip to content

An Idea for partial sequential cpu offloading #12749

@rodjjo

Description

@rodjjo

Just an idea.

It's not a problem or anything...

I've be using a custom offload for my potato GPU. Maybe there is another way to do it or so...

In short, I've being using sequential offloading for a long time, when I enable it it use a minimal of VRAM, however I know It could use more VRAM to do less IO, so I created a Mixin for partial CPU offload where the model can keep several layers on GPU and just offload some.

See code here: https://gist.github.com/rodjjo/20e2e842fea9ed58114adb560a4566b6

 class MyQwen3ForCausalLM(Qwen3ForCausalLM, PartialOffloadMixin):
          LAYERS_KEEP_GPU = 22
          MODEL_ATTR_NAME = "model"
          MODEL_LAYERS_ATTR_NAME = "layers"
          OFFLOAD_ON_CALL = True
       model = MyQwen3ForCausalLM.from_pretrained(
            repo_id,
            subfolder="text_encoder",
            local_files_only=True,
            torch_dtype=torch.bfloat16,
       )
       model.eval()
       model.enable_partial_cpu_offload()
       # pseudo code of inference
       result = model(...)  # call was overrided and calls go_gpu(True) go_gpu(False)
      example transformer:
      class MyZImageTransformer(ZImageTransformer2DModel, PartialOffloadMixin):
          MODEL_LAYERS_ATTR_NAME = "layers"
          LAYERS_KEEP_GPU = 22
      model = MyZImageTransformer.from_pretrained(
          repo_id,
          subfolder="transformer",
          torch_dtype=torch.bfloat16,
      )
      model.eval()
      model.enable_partial_cpu_offload()
      
      # denoise step
      model.go_gpu(True)
  
      while denoising:  #pseudo code
           predicted = model(...)
      model.go_gpu(False)

It's saving me 12 to 13 seconds of inference in zimage turbo (my custom pipeline with this partial layers offloading):

Before (normal sequential offloading):
Image

After (partial sequential offloading):
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions