🚀 𝘃𝟮 𝗼𝗳 𝗼𝘂𝗿 𝗽𝗮𝗽𝗲𝗿 “𝗟𝗟𝗠-𝗝𝗘𝗣𝗔” 𝗶𝘀 𝗼𝘂𝘁 𝗼𝗻 𝗮𝗿𝗫𝗶𝘃! 🔍 𝐖𝐡𝐚𝐭’𝐬 𝐧𝐞𝐰? ✅ 𝗦𝗶𝗴𝗻𝗶𝗳𝗶𝗰𝗮𝗻𝘁𝗹𝘆 𝗹𝗼𝘄𝗲𝗿 𝗰𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗼𝘃𝗲𝗿𝗵𝗲𝗮𝗱 — reduced overhead from 𝟮𝟬𝟬% → 𝟮𝟱% using a simple yet effective 𝗿𝗮𝗻𝗱𝗼𝗺 𝗝𝗘𝗣𝗔-𝗹𝗼𝘀𝘀 𝗱𝗿𝗼𝗽𝗼𝘂𝘁. ✅ 𝗕𝗿𝗼𝗮𝗱𝗲𝗿 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀 — extended beyond symmetric 2-view datasets to 𝗡𝗤-𝗢𝗽𝗲𝗻 (Natural Questions for open-domain) and 𝗛𝗲𝗹𝗹𝗮𝗦𝘄𝗮𝗴 (sentence completion), and tested on reasoning models like 𝗤𝘄𝗲𝗻𝟯 and 𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸-𝗥𝟭-𝗗𝗶𝘀𝘁���𝗹𝗹𝗲𝗱. ✅ 𝗥𝗶𝗴𝗼𝗿𝗼𝘂𝘀 𝗮𝗯𝗹𝗮𝘁𝗶𝗼𝗻𝘀 — JEPA loss design outperforms alternatives including 𝗟𝟮, 𝗠𝗦𝗘, 𝗽𝗿𝗲𝗽𝗲𝗻𝗱 [𝗣𝗥𝗘𝗗] 𝘁𝗼𝗸𝗲𝗻𝘀, 𝗖𝗼𝗱𝗲→𝗧𝗲𝘅𝘁, and 𝗜𝗻𝗳𝗼𝗡𝗖𝗘 variants. 🧩 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐋𝐋𝐌-𝐉𝐄𝐏𝐀? If you’re seeing this for the first time: LLM-JEPA introduces the 𝗝𝗼𝗶𝗻𝘁 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝘃𝗲 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 (𝗝𝗘𝗣𝗔) — a self-supervised learning paradigm proven in vision — as a 𝗿𝗲𝗴𝘂𝗹𝗮𝗿𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗹𝗼𝘀𝘀 for LLMs. Combined with next-token prediction, it enables models to: 🎯 Boost fine-tuning accuracy 🧠 Resist overfitting 🌱 Work in pretraining via 𝗽𝗮𝗿𝗮𝗽𝗵𝗿𝗮𝘀𝗲-𝗯𝗮𝘀𝗲𝗱 𝗝𝗘𝗣𝗔 🌀 Induce 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲𝗱 𝗹𝗮𝘁𝗲𝗻𝘁 𝗿𝗲𝗽𝗿𝗲𝘀𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻𝘀 unseen in either base or normally fine-tuned models 🧪 The 𝘃𝟭 𝘄𝗼𝗿𝗸𝘀𝗵𝗼𝗽 𝘃𝗲𝗿𝘀𝗶𝗼𝗻 (accepted to NeurIPS 2025 UniReps + DL4C) received valuable feedback highlighting high compute cost, limited applications, and missing ablations — all fully addressed in this release. Huge thanks to the UniReps and DL4C reviewers for their constructive and insightful comments that helped shape v2. It’s been a privilege to collaborate with Yann LeCun (NYU) and Randall Balestriero (Brown) — few experiences are more inspiring than working alongside the pioneers of modern deep and self-supervised learning. The 𝗰𝗼𝗱𝗲 𝗶𝘀 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲𝗱, and we warmly invite others to experiment with it — and help explore this emerging frontier between 𝗝𝗘𝗣𝗔 𝗮𝗻𝗱 𝗟𝗟𝗠𝘀. 💻 Code: https://lnkd.in/eUX2b8iE 📄 Paper: https://lnkd.in/ers8_yzm Together with Yann and Randall, we’re already exploring new variants and applications — and look forward to sharing more soon. Stay tuned!
Self-Supervised Learning Methods
Explore top LinkedIn content from expert professionals.
Summary
Self-supervised learning methods let computers teach themselves by discovering patterns in large amounts of unlabeled data, making it easier to build models without expensive manual labeling. These approaches are changing fields like image analysis, natural language processing, and medical research by empowering machines to generate their own training signals from the data itself.
- Spot and reduce bias: Pay attention to spurious correlations in your data and use dynamic sampling techniques to ensure your models learn fair and balanced features across all groups.
- Use unlabeled data: Take advantage of vast datasets without labels by training models with self-supervised techniques, which can reveal hidden patterns for tasks like object recognition and gene expression prediction.
- Simplify pretraining: Explore streamlined self-supervised frameworks like SimSiam and DINO, which make it easier to create powerful models without complex architecture or labeled examples.
-
-
A good ending to 2023 is sharing one of my favorite papers from our group (lead by Weicheng Zhu at NYU Center for Data Science) on improving fairness in self supervised learning, at the SSL stage and without relying on any known attributes about the samples or downstream task. This is especially important as we move more and more towards using general purpose SSL-pre-trained foundation models as basis for most tasks. Our focus is on making these foundation models more "fair". Fairness has many definitions. Here, we focus on learning good (discriminative) features for all sub-cohorts during SSL pre-training. It is often the case that variables that describe data have correlations. i.e. Age is correlated with various disease risks. If the correlation is high, e.g. older people have a lot more prevalence of diseases, the SSL pre-trained model may focus only on learning features of old age instead of features of diseases within the old age group. This is called learning spurious correlations and has been observed during SSL. If we knew all these variables and their correlations upfront, we could use conditional SSL, but (1) we often work with truly unlabeled data during SSL, and (2) we don't know good features that describe our data in advance, otherwise we wouldn't need SSL. We have observed that in SSL, for instance contrastive SSL, the samples that are *aligned* with the correlations (i.e. old AND sick) converge faster than those that are in *conflict* with these correlations (i.e. old AND not sick, often in minority). These two samples have measurably different learning speed. We use this insight, and propose a resampling method that dynamically samples training examples with a probability negatively related to their learning speed. This reduces the impact of spurious correlations in the dataset by upsampling the examples that may conflict with spurious correlation. We show evidence that our approach leads to better features in the settings that involve strong spurious correlations. I'm really excited about implications of this work, and extension to various data modalities and settings. Here's our paper: Zhu, W., Liu, S., Fernandez-Granda, C. and Razavian, N., 2023. Making Self-supervised Learning Robust to Spurious Correlation via Learning-speed Aware Sampling. arXiv preprint arXiv:2311.16361. https://lnkd.in/eUPETV6m #deeplearning #selfsupervisedLearning #foundationmodels #fairness #spuriousCorrelations #medicalai #aiinmedicine #happynewyear2024
-
Our New Paper: “Understanding the Learning Phases in Self-Supervised Learning via Critical Periods”, has been accepted to The Fourteenth International Conference on Learning Representations (ICLR 2026). https://lnkd.in/eCb_Ny-s The paper introduces critical learning periods (CP) for a systematic analysis of self-supervised learning (SSL) in training large vision transformer models. Contrary to the prevailing heuristic that longer pretraining of foundation models translates to better downstream performance, we instead discover a transferability trade-off across diverse SSL settings and show intermediate checkpoints often yield stronger out-of-domain (OOD) generalization, whereas additional pretraining leads to overspecialization and primarily benefits in-domain (ID) performance. From this observation, we hypothesize that SSL progresses through learning phases that can be characterized through the lens of CP. We then propose a label-free criteria to identify CP-closure, enabling better checkpoint selection for OOD robustness and more. Why this matters particularly for large-scale foundation models: ▶️ The "Longer is Better" Fallacy: We’ve long operated under the belief that longer pretraining naturally leads to better foundation models. Our work shows there is actually a "transferability trade-off." Past a certain point, a model might get better at its specific ID task, but its ability to handle OOD starts to tank. Longer training isn’t always better, but just overspecialization. ▶️ Identifying the Pretraining "Sweet Spot": By looking at SSL through the lens of "Critical Periods" we identify windows where models are most in adaptable phase. Turns out the "CP closure" marks the exact moment where the model has learned the most useful, general representations. If you keep training past this point, the model's "brain" starts to harden, making it less useful for anything other than the specific data it’s already seen. ▶️ An SSL EarlyStopping Button: Instead of just guessing when to stop pretraining, we use the Fisher Information matrix to measure the model's plasticity in real-time. This gives us a quantifiable way to see when a model is starting to overfit the pretext task. For anyone building large-scale models, this is a huge deal for compute efficiency—it tells you exactly when you've reached diminishing returns and wasting compute resources. ▶️ Having Best of Both: Usually, you have to choose between a model that is very accurate on ID and one that generalizes well for OOD testing. Our "CP-guided self-distillation" bridges that gap. By distilling the less brittle representations from the "sweet spot" checkpoint into the final checkpoint parameters, we can obtain optimal foundation models that mitigate overfitting to pretext tasks and are highly adaptable to new, unseen scenarios. Congratulations to the team: JangHyeon Lee, Philipe Ambrozio Dias, Yao-Yi Chiang, Dalton Lunga #ICLR2026 #CriticalLearningPeriods #SSL
-
🧪 New Machine Learning Research: Is ImageNet Outperformed by a Single Video? How can a single long, unlabeled video rival the famous ImageNet dataset? Groundbreaking research by Shashank Venkataramanan (Inria), Mamshad Nayeem Rizve (University of Central Florida), João Carreira (Google DeepMind), Yuki Asano (University of Amsterdam), and Yannis Avrithis (Institute of Advanced Research in Artificial Intelligence (IARAI) explores this possibility through their method DORA - Research goal: The study aims to demonstrate that egocentric videos, like virtual "Walking Tours," can train image encoders as effectively as large-scale datasets like ImageNet. - Research methodology: The researchers introduced DORA, a self-supervised method that learns from 10 long, high-resolution videos (average 1.5 hours each). This method tracks and identifies objects over time, using cross-attention in vision transformers. - Key findings: DORA pretrained on a single Walking Tours video achieved 44.5% accuracy in ImageNet classification tasks, outperforming existing models trained on curated datasets. It also improved object discovery scores to 56.2% on Pascal VOC 2012, showing a 5% gain over traditional methods. - Practical implications: This method is ideal for tasks requiring data efficiency, such as urban scene understanding and autonomous vehicle navigation. With minimal labeled data, DORA offers an accessible approach for developing robust visual encoders. #LabelYourData #MachineLearning #Innovation #AIResearch #MLResearch #ComputerVision #SelfSupervisedLearning #ImageNet #DataAnnotation
-
Self-supervision continues to revolutionize pre-training strategies in foundation models, offering efficient ways to learn representations without labeled data. This week, our blog-post explores SimSiam, which stands out as a simple yet powerful approach to Representation Learning. What’s covered? ✅Detailed discussion on SimSiam’s architecture and algorithm, which simplifies pretraining by removing the need for Negative Pairs and Momentum Encoders. ✅Thorough Investigation on SimSiam’s Non-Collapsing Dynamics ✅SimSiam's Implicit Optimization Dynamics To make the exploration practical, the blog-post also includes the pretraining code for SimSiam and the code for fine-tuning its learned representations on the downstream classification task. This is a must-read for those eager to understand and implement Self-Supervised Learning! Read the full article here: https://buff.ly/3Wadooe
-
*** Self-Supervised Learning Models: An In-Depth Overview *** Self-Supervised Learning (SSL) models represent a transformative approach in machine learning, allowing systems to extract meaningful patterns and insights from vast amounts of unlabeled data. Unlike traditional machine learning methods that rely heavily on meticulously labeled datasets, SSL techniques empower models to generate their supervisory signals, effectively creating pseudo-labels directly from the inherent features of the data itself. This capability significantly enhances the scalability of the learning process, making it more efficient and less reliant on human intervention. The beauty of SSL lies in its capacity to harness the abundance of unlabeled data available, which is particularly advantageous in deep learning scenarios where such data is often plentiful. Labeling this data can be cumbersome and costly, but SSL mitigates these challenges, enabling models to learn rich representations autonomously. Several pivotal techniques form the foundation of SSL, including: 1. **Contrastive Learning:** This technique sharpens the model's ability to differentiate between similar and dissimilar data points. SSL can develop a nuanced understanding of data distributions and their dimensions by identifying what makes examples alike or distinct. 2. **Predictive Learning:** The model predicts missing or obscured parts of the input data in this approach. For instance, it might forecast the next word in a sequence or infer the orientation of an object within an image. This predictive capability is essential for building robust representations. 3. **Autoassociative Learning:** Through this method, neural networks aim to reconstruct their original input data. By doing so, they learn to capture meaningful and abstract representations that can later be useful for various downstream tasks. SSL's applications are widespread and impactful, with successful implementation across various fields, such as natural language processing (NLP), computer vision, and speech recognition. Prominent companies like Meta have embraced these self-supervised techniques to enhance the efficiency and accuracy of their speech processing systems, illustrating the transformative potential of SSL in real-world applications. --- B. Noted