Skip to content
View nahidalam's full-sized avatar
👩‍💻
👩‍💻

Block or report nahidalam

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
nahidalam/README.md

Hi, this is Nahid. I am an independent researcher with Cohere Labs community, working on Multimodal Learning, Computer Vision and Embodied AI.

I recently created Maya – a multilingual multimodal LLM. I work at the intersection of multimodal learning, computer vision and embodied ai, developing models that perceive, reason, and act in the physical world.
My current interests include:

  • Spatial understanding in VLMs for real-world perception
  • Physics-aware world models
  • Multimodal Learning
  • Simulation and Embodied AI

Publications

  • Behind Maya: Building a Multilingual Vision-Language Model.
    Nahid Alam et al. CVPR 2025 Workshop (VLMs4All).
    arXiv · Google Scholar

  • Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA.
    Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Shayekh Islam.
    CVPR 2025 Workshop (ReGenAI), Oral.
    arXiv · Google Scholar

  • Embedding Geometries of Contrastive Language-Image Pre-Training.
    Jason Chuan-Chih Chou, Nahid Alam. ECCV 2024 Workshop (Beyond Euclidean).
    arXiv · Google Scholar

More at Google Scholar


Recent Projects

  • Maya: Multilingual multimodal foundation model (2 CVPR workshops)
  • Gemma3n-VLA: Vision-Language-Action model built with Hugging Face LeRobot
  • GR00T-N1 Hackathon: Bimanual robot manipulation with multimodal control

🌐 Connect


Pinned Loading

  1. maya maya Public

    Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya

    Python 124 11

  2. customer_bot customer_bot Public

    Simple chatbot using Rasa.ai

    Python 47 47

  3. modnet_docker modnet_docker Public

    Dockerized container for MODNet - a Real-Time Portrait Matting solution

    Python 13 4

  4. LLaVA LLaVA Public

    Forked from haotian-liu/LLaVA

    [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

    Python 5 15