Submitted by melisa 125 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning · 8 authors 4
Submitted by zelaix 48 VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments · 8 authors 1
Submitted by BestWishYsh 46 UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation · 12 authors 1
Submitted by xyliu6 42 SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis · 6 authors 1
Submitted by OrlandoHugBot 42 CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs · 9 authors 2
Submitted by ganlinyang 27 Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces · 18 authors 4
Submitted by luojunyu 27 FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation · 13 authors 2
Submitted by qizekun 26 OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models · 8 authors 1
Submitted by Cynthia-1628 25 OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation · 9 authors 1
Submitted by AnonMegumi 19 MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs · 9 authors 1
Submitted by wchengad 18 Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers · 8 authors 1
Submitted by vangard703 16 Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics · 6 authors 1
Submitted by Lingaaaaaaa 15 Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning · 5 authors 1
Submitted by liyz 14 AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation · 6 authors 1
Submitted by yiren98 11 RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers · 5 authors 1
Submitted by erjui 10 PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models · 5 authors 2
Submitted by Hila 8 FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation · 4 authors 1
Submitted by ChenyangSi 7 DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation · 7 authors 1
Submitted by gentaiscool 7 Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability · 20 authors 1
Submitted by arkimjh 4 ReFoCUS: Reinforcement-guided Frame Optimization for Contextual Understanding · 4 authors 1
Submitted by chs20 3 FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens · 4 authors 1
Submitted by hyungjoochae 3 One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL · 9 authors 1
Submitted by danielmisrael 3 Accelerating Diffusion LLMs via Adaptive Parallel Decoding · 3 authors 1
Submitted by gq2138 3 SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQL · 7 authors 1
Submitted by amazingj 2 M^3FinMeeting: A Multilingual, Multi-Sector, and Multi-Task Financial Meeting Understanding Evaluation Dataset · 6 authors 1
Submitted by zhaoruiyang 2 Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework · 8 authors 1
Submitted by Omartificial-Intelligence-Space 2 QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation · 7 authors 1
Submitted by lyan62 2 Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation · 6 authors 1
Submitted by jamescai20 2 How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning · 4 authors 1
Submitted by xyzhang626 2 Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding · 7 authors 1
Submitted by WeiChow 1 MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query · 18 authors 1
Submitted by GSean 1 Controllable Human-centric Keyframe Interpolation with Generative Prior · 5 authors 1
Submitted by lx865712528 1 TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression · 15 authors 1
Submitted by ItamarZ 1 Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability · 4 authors 2
Submitted by dxlong2000 1 Beyond In-Context Learning: Aligning Long-form Generation of Large Language Models via Task-Inherent Attribute Guidelines · 8 authors 1
Submitted by anumafzal94 1 Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion · 4 authors 1