Skip to content
View orrzohar's full-sized avatar
Video
Video

Highlights

  • Pro

Block or report orrzohar

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
orrzohar/README.md

I'm Orr Zohar 👋

My research focuses on Large Multimodal Models, with the hope of pushing these models to be capable more capable of understanding images and videos.

  • smolvlm2 SmolVLM2: The tiniest video-LMM ever!
  • Astronaut Helmet Apollo: Exploring video understanding in LMMs
  • 💫 Video-STAR: Introduced a method that allows the utilization of any labeled video dataset for instruction tuning.
  • 🤖 VideoAgent: A novel agent-based system that utilizes a large language model to iteratively identify and compile crucial information from long-form videos

Pinned Loading

  1. huggingface/smollm huggingface/smollm Public

    Everything about the SmolLM and SmolVLM family of models

    Python 3.5k 249

  2. Video-STaR Video-STaR Public

    [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

    Python 72 6

  3. huggingface/transformers huggingface/transformers Public

    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

    Python 154k 31.6k

  4. PROB PROB Public

    [CVPR 2023] Official Pytorch code for PROB: Probabilistic Objectness for Open World Object Detection

    Python 142 16

  5. FOMO FOMO Public

    Official Pytorch code for Open World Object Detection in the Era of Foundation Models

    Python 89 6

  6. LOVM LOVM Public

    [NeurIPS 2023] Official Pytorch code for LOVM: Language-Only Vision Model Selection

    Python 21