-
Notifications
You must be signed in to change notification settings - Fork 31k
Open
Labels
Feature requestRequest for a new featureRequest for a new feature
Description
Feature request
The new RopeParameters data structure is really improving the clarity of RoPE configurations across the library. However, I have noticed two areas for improvement:
- Type hints and docstrings for key functions, such as modeling_rope_utils.standardize_rope_params(), to clarify and reflect their relationship to the
RopeParametersstructure. - Loosening typing enforcement on
RopeParameterssuch that you only need to specify the flags relevant for your model's RoPE needs by settingtotal=Falsein the class definition and using theRequiredtype annotation for the few parameters that really do need to to be there.
Motivation
- Strong typing is almost always better, in my opinion.
- Having to specify the every
RopeParametersproperty is annoying when you only use a couple of them. Consider Gemma 3. Under the current definition ofRopeParameters, it requires 28 lines to satisfy the type checker:When this could be expressed in 11 lines:rope_params = { "full_attention": RopeParameters( rope_type="linear", rope_theta=1_000_000.0, factor=8, original_max_position_embeddings=None, attention_factor=None, beta_fast=None, beta_slow=None, short_factor=None, long_factor=None, low_freq_factor=None, high_freq_factor=None, ), "sliding_attention": RopeParameters( rope_type="default", rope_theta=10000.0, factor=None, original_max_position_embeddings=None, attention_factor=None, beta_fast=None, beta_slow=None, short_factor=None, long_factor=None, low_freq_factor=None, high_freq_factor=None, ), }
rope_params = { "full_attention": RopeParameters( rope_type="linear", rope_theta=1_000_000.0, factor=8, ), "sliding_attention": RopeParameters( rope_type="default", rope_theta=10000.0, ), }
Your contribution
I'm willing to do all of this. Also, I've already been making some progress against these objectives in #41934 and #41963.
Metadata
Metadata
Assignees
Labels
Feature requestRequest for a new featureRequest for a new feature