[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
-
Updated
Apr 25, 2025 - Python
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
[Survey] Masked Modeling for Self-supervised Representation Learning on Vision and Beyond (https://arxiv.org/abs/2401.00897)
SimpleClick: Interactive Image Segmentation with Simple Vision Transformers (ICCV 2023)
reproduction of semantic segmentation using masked autoencoder (mae)
PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529
[CVPR2023] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning (https://arxiv.org/abs/2212.04500)
Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders
[ECCV 2024] Pytorch code for our ECCV'24 paper NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'
[CVPR'23 & TPAMI'25] Hard Patches Mining for Masked Image Modeling
Unofficial PyTorch implementation of Masked Autoencoders that Listen
[NeurIPS 2022 Spotlight] VideoMAE for Action Detection
[SIGIR'2023] "MAERec: Graph Masked Autoencoder for Sequential Recommendation"
Implementation of the proposed LVMAE, from the paper, Extending Video Masked Autoencoders to 128 frames, in Pytorch
Multi-scale Transformer Network for Cross-Modality MR Image Synthesis (IEEE TMI)
[CVPR 2025] Official PyTorch implementation of MaskSub "Masking meets Supervision: A Strong Learning Alliance"
VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models
Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".
Add a description, image, and links to the masked-autoencoder topic page so that developers can more easily learn about it.
To associate your repository with the masked-autoencoder topic, visit your repo's landing page and select "manage topics."