Skip to content

Improve typing support and flexibility for RopeParameters #41964

@RyanMullins

Description

@RyanMullins

Feature request

The new RopeParameters data structure is really improving the clarity of RoPE configurations across the library. However, I have noticed two areas for improvement:

  1. Type hints and docstrings for key functions, such as modeling_rope_utils.standardize_rope_params(), to clarify and reflect their relationship to the RopeParameters structure.
  2. Loosening typing enforcement on RopeParameters such that you only need to specify the flags relevant for your model's RoPE needs by setting total=False in the class definition and using the Required type annotation for the few parameters that really do need to to be there.

Motivation

  1. Strong typing is almost always better, in my opinion.
  2. Having to specify the every RopeParameters property is annoying when you only use a couple of them. Consider Gemma 3. Under the current definition of RopeParameters, it requires 28 lines to satisfy the type checker:
    rope_params = {
        "full_attention": RopeParameters(
            rope_type="linear",
            rope_theta=1_000_000.0,
            factor=8,
            original_max_position_embeddings=None,
            attention_factor=None,
            beta_fast=None,
            beta_slow=None,
            short_factor=None,
            long_factor=None,
            low_freq_factor=None,
            high_freq_factor=None,
        ),
        "sliding_attention": RopeParameters(
            rope_type="default",
            rope_theta=10000.0,
            factor=None,
            original_max_position_embeddings=None,
            attention_factor=None,
            beta_fast=None,
            beta_slow=None,
            short_factor=None,
            long_factor=None,
            low_freq_factor=None,
            high_freq_factor=None,
        ),
    }
    When this could be expressed in 11 lines:
    rope_params = {
        "full_attention": RopeParameters(
            rope_type="linear",
            rope_theta=1_000_000.0,
            factor=8,
        ),
        "sliding_attention": RopeParameters(
            rope_type="default",
            rope_theta=10000.0,
        ),
    }

Your contribution

I'm willing to do all of this. Also, I've already been making some progress against these objectives in #41934 and #41963.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions