Computer Science > Computer Vision and Pattern Recognition

arXiv:2604.14141 (cs)

[Submitted on 15 Apr 2026 (v1), last revised 16 Apr 2026 (this version, v2)]

Title:Geometric Context Transformer for Streaming 3D Reconstruction

Authors:Lin-Zhuo Chen, Jian Gao, Yihang Chen, Ka Leong Cheng, Yipengjing Sun, Liangxiao Hu, Nan Xue, Xing Zhu, Yujun Shen, Yao Yao, Yinghao Xu

View PDF HTML (experimental)

Abstract:Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which necessitates geometric accuracy, temporal consistency, and computational efficiency. Motivated by the principles of Simultaneous Localization and Mapping (SLAM), we introduce LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. A defining aspect of LingBot-Map lies in its carefully designed attention mechanism, which integrates an anchor context, a pose-reference window, and a trajectory memory to address coordinate grounding, dense geometric cues, and long-range drift correction, respectively. This design keeps the streaming state compact while retaining rich geometric context, enabling stable efficient inference at around 20 FPS on 518 x 378 resolution inputs over long sequences exceeding 10,000 frames. Extensive evaluations across a variety of benchmarks demonstrate that our approach achieves superior performance compared to both existing streaming and iterative optimization-based approaches.

Comments:	Project page: this https URL Code: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2604.14141 [cs.CV]
	(or arXiv:2604.14141v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2604.14141

Submission history

From: Yinghao Xu [view email]
[v1] Wed, 15 Apr 2026 17:58:13 UTC (19,761 KB)
[v2] Thu, 16 Apr 2026 16:44:56 UTC (19,762 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Geometric Context Transformer for Streaming 3D Reconstruction

Submission history

Access Paper:

Current browse context:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Geometric Context Transformer for Streaming 3D Reconstruction

Submission history

Access Paper:

Current browse context:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators