Skip to content
View auhowielau's full-sized avatar

Block or report auhowielau

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Repo for paper "MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding".

Python 39 1 Updated Jun 9, 2025

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Python 2,608 399 Updated Feb 28, 2026
Python 237 28 Updated Apr 23, 2024

Mobile-Agent: The Powerful GUI Agent Family

Python 7,619 772 Updated Mar 1, 2026

mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating

Python 97 3 Updated Jan 29, 2024

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

Python 228 21 Updated Jul 21, 2023