You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Gemma 3n is a generative AI model that takes inputs from a variety of modalities, including images and audio, and is optimized for efficient resource usage and fast inference on everyday devices. It introduces innovations such as Per-Layer Embedding parameter caching and the MatFormer architecture, which help reduce compute and memory demands, making it ideal for lightweight deployments. Some key highlights:
Optimized architecture featuring MatFormer's nested transformers, Per-Layer Embeddings (PLE), and KV cache sharing. These enable sub-model extraction, reduced GPU memory usage, and faster prefill speeds.
Multimodal capabilities with integrated image and audio encoders alongside the language model, enabling diverse tasks across modalities.
Today, we are excited to announce that NeMo Automodel now supports Gemma 3n, making it easier than ever to load, train, and inference with Gemma 3n models.
⚡ Fine-tuning Gemma 3n with NeMo Automodel
NeMo Framework's Automodel path ("Nemo AutoModel") offers day-0 support for 🤗Hugging Face models via a unified interface to load and finetune models across modalities, abstracting away backend complexity. With Gemma 3n support:
Load models with a single from_pretrained call
Finetune models using full parameter training or PEFT(LoRA) with predefined recipes
Accelerate training with kernel optimizations
Leverage FSDP2/nvFSDP for efficient distributed training
Check out our tutorial on SFT and PEFT for both Gemma 3 and Gemma 3n models!
🔍 Observations
Training Dynamics
During the first hundred optimization steps we observed suspiciously large gradients that quickly stabilize. While the run remains numerically stable after this "warm-up," overall convergence still lags behind Gemma-3. We continue to investigate the source of this discrepancy.
Our preliminary benchmark on vision and audio capabilities show some gap between Gemma 3n and existing alternatives. We will follow up with more concrete results later.
✨ Conclusion
Gemma 3n brings impressive efficiency and opens up new possibilities for multimodal tasks on devices. With NeMo Automodel, getting started and fine-tuning these efficient models require only a few commands!
We look forward to seeing what you build with Gemma 3n and NeMo Automodel. Check out the documentation guide for a full walkthrough, and reach out on GitHub Discussions if you have questions.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Authors: @HuiyingLi, @akoumpa, @snowmanwwg, @Sylendran95
Introduction
Gemma 3n is a generative AI model that takes inputs from a variety of modalities, including images and audio, and is optimized for efficient resource usage and fast inference on everyday devices. It introduces innovations such as Per-Layer Embedding parameter caching and the MatFormer architecture, which help reduce compute and memory demands, making it ideal for lightweight deployments. Some key highlights:
Today, we are excited to announce that NeMo Automodel now supports Gemma 3n, making it easier than ever to load, train, and inference with Gemma 3n models.
⚡ Fine-tuning Gemma 3n with NeMo Automodel
NeMo Framework's Automodel path ("Nemo AutoModel") offers day-0 support for 🤗Hugging Face models via a unified interface to load and finetune models across modalities, abstracting away backend complexity. With Gemma 3n support:
from_pretrainedcallCheck out our tutorial on SFT and PEFT for both Gemma 3 and Gemma 3n models!
🔍 Observations
Training Dynamics
During the first hundred optimization steps we observed suspiciously large gradients that quickly stabilize. While the run remains numerically stable after this "warm-up," overall convergence still lags behind Gemma-3. We continue to investigate the source of this discrepancy.
Our preliminary benchmark on vision and audio capabilities show some gap between Gemma 3n and existing alternatives. We will follow up with more concrete results later.
✨ Conclusion
Gemma 3n brings impressive efficiency and opens up new possibilities for multimodal tasks on devices. With NeMo Automodel, getting started and fine-tuning these efficient models require only a few commands!
We look forward to seeing what you build with Gemma 3n and NeMo Automodel. Check out the documentation guide for a full walkthrough, and reach out on GitHub Discussions if you have questions.
🔗 References
Gemma 3n on Hugging Face
NeMo Automodel
NeMo Automodel Gemma 3 Guide
Beta Was this translation helpful? Give feedback.
All reactions