Skip to content

jahhaoyang/GA-VLN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation

Jiahao Yang  Zihan Wang  Xiangyang Li  Xing Zhu  Yujun Shen  Yinghao Xu†  Shuqiang Jiang† 

Paper HuggingFace


🌟 Official implementation of GA-VLN, a geometry-aware BEV representation framework designed for Vision-Language Navigation (VLN).

  • Efficient BEV Representation: Compresses dense multi-view visual observations into a unified BEV space for efficient representation.
  • Robust Spatial Reasoning: Integrates 3D foundation model to enhance geometry-aware perception.
  • Real-World Resilience: Demonstrates high robustness against sensor noise modeled after real-world error.

⚙️ Requirements

1. Create environment

conda create -n gavln python=3.9
conda activate gavln
pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu121 -r requirements.txt

2. Install Habitat

conda install habitat-sim==0.2.4 withbullet headless -c conda-forge -c aihabitat
git clone --branch v0.2.4 https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab
pip install -e habitat-lab
pip install -e habitat-baselines

📦 Data & Checkpoints Preparation

Download the VLN-CE data and SigLIP & VGGT backbones. Your directory tree should look like this:

GA-VLN/
├── vln_data/
│   └── scene_datasets/
│   └── datasets/
├── model/
│   ├── siglip-so400m-patch14-384/
│   └── VGGT-1B/
├── checkpoints/
│   └── gavln_official/
├── ...

📊 Evaluation

bash scripts/gavln_eval.sh

📖 Citation

@article{yang2026ga,
  title={GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation},
  author={Yang, Jiahao and Wang, Zihan and Li, Xiangyang and Zhu, Xing and Shen, Yujun and Xu, Yinghao and Jiang, Shuqiang},
  journal={arXiv preprint arXiv:2605.22036},
  year={2026}
}

✨ Acknowledgments

Our code is based on StreamVLN and VGGT. Thanks for their great works!

About

[CVPR 2026] Official implementation of "GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages