Nahid Alam nahidalam

Hi, this is Nahid. I am an independent researcher with Cohere Labs community, working on Multimodal Learning, Computer Vision and Embodied AI.

I recently created Maya – a multilingual multimodal LLM. I work at the intersection of multimodal learning, computer vision and embodied ai, developing models that perceive, reason, and act in the physical world.
My current interests include:

Spatial understanding in VLMs for real-world perception
Physics-aware world models
Multimodal Learning
Simulation and Embodied AI

Publications

Behind Maya: Building a Multilingual Vision-Language Model.
Nahid Alam et al. CVPR 2025 Workshop (VLMs4All).
arXiv · Google Scholar
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA.
Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Shayekh Islam.
CVPR 2025 Workshop (ReGenAI), Oral.
arXiv · Google Scholar
Embedding Geometries of Contrastive Language-Image Pre-Training.
Jason Chuan-Chih Chou, Nahid Alam. ECCV 2024 Workshop (Beyond Euclidean).
arXiv · Google Scholar

More at Google Scholar

Recent Projects

Maya: Multilingual multimodal foundation model (2 CVPR workshops)
Gemma3n-VLA: Vision-Language-Action model built with Hugging Face LeRobot
GR00T-N1 Hackathon: Bimanual robot manipulation with multimodal control

🌐 Connect

LinkedIn: nahidalam
Twitter / X: @nahidalam

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nahid Alam nahidalam

Achievements

Achievements

Block or report nahidalam

Publications

Recent Projects

🌐 Connect

Pinned Loading

Uh oh!