GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation

Jiahao Yang Zihan Wang Xiangyang Li Xing Zhu Yujun Shen Yinghao Xu† Shuqiang Jiang†

🌟 Official implementation of GA-VLN, a geometry-aware BEV representation framework designed for Vision-Language Navigation (VLN).

Efficient BEV Representation: Compresses dense multi-view visual observations into a unified BEV space for efficient representation.
Robust Spatial Reasoning: Integrates 3D foundation model to enhance geometry-aware perception.
Real-World Resilience: Demonstrates high robustness against sensor noise modeled after real-world error.

⚙️ Requirements

1. Create environment

conda create -n gavln python=3.9
conda activate gavln
pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu121 -r requirements.txt

2. Install Habitat

conda install habitat-sim==0.2.4 withbullet headless -c conda-forge -c aihabitat
git clone --branch v0.2.4 https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab
pip install -e habitat-lab
pip install -e habitat-baselines

📦 Data & Checkpoints Preparation

Download the VLN-CE data and SigLIP & VGGT backbones. Your directory tree should look like this:

GA-VLN/
├── vln_data/
│   └── scene_datasets/
│   └── datasets/
├── model/
│   ├── siglip-so400m-patch14-384/
│   └── VGGT-1B/
├── checkpoints/
│   └── gavln_official/
├── ...

📊 Evaluation

bash scripts/gavln_eval.sh

📖 Citation

@article{yang2026ga,
  title={GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation},
  author={Yang, Jiahao and Wang, Zihan and Li, Xiangyang and Zhu, Xing and Shen, Yujun and Xu, Yinghao and Jiang, Shuqiang},
  journal={arXiv preprint arXiv:2605.22036},
  year={2026}
}

✨ Acknowledgments

Our code is based on StreamVLN and VGGT. Thanks for their great works!

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
assets		assets
config		config
gavln		gavln
llava		llava
scripts		scripts
trl		trl
vggt		vggt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation

🌟 Official implementation of GA-VLN, a geometry-aware BEV representation framework designed for Vision-Language Navigation (VLN).

⚙️ Requirements

📦 Data & Checkpoints Preparation

📊 Evaluation

📖 Citation

✨ Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation

🌟 Official implementation of GA-VLN, a geometry-aware BEV representation framework designed for Vision-Language Navigation (VLN).

⚙️ Requirements

📦 Data & Checkpoints Preparation

📊 Evaluation

📖 Citation

✨ Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages