Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle

ICCV 2025

Papers with code:

Important

The new version of BBox-Mask-Pose (BMPv2) is now available on arXiv. BMPv2 significantly improves performance; see the quantitative results reported in the preprint. One of the key contributions is PMPose, a new top-down pose estimation model, that is already strong on standard benchmarks and in crowded scenes. The code will be added to the BMP-v2 branch in the following weeks and gradually merged into main as well as to the online demo.

📋 Overview

The BBox-Mask-Pose (BMP) method integrates detection, pose estimation, and segmentation into a self-improving loop by conditioning these tasks on each other. This approach enhances all three tasks simultaneously. Using segmentation masks instead of bounding boxes improves performance in crowded scenarios, making top-down methods competitive with bottom-up approaches.

Key contributions:

MaskPose: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
- Download pre-trained weights below
BBox-MaskPose (BMP): method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation
- Try the demo!
Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes')
- Download pre-trained weights below
Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose.

For more details, please visit our project website.

📢 News

Aug 2025: HuggingFace Image Demo is out! 🎮
Jul 2025: Version 1.1 with easy-to-run image demo released
Jun 2025: Paper accepted to ICCV 2025! 🎉
Dec 2024: The code is available
Nov 2024: The project website is on

🚀 Installation

Docker Installation (Recommended)

The fastest way to get started with GPU support:

# Clone and build
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git
cd BBoxMaskPose
docker-compose build

# Run the demo
docker-compose up

Requires: Docker Engine 19.03+, NVIDIA Container Toolkit, NVIDIA GPU with CUDA 12.1 support.

Manual Installation

This project is built on top of MMPose and SAM 2.1. Please refer to the MMPose installation guide or SAM installation guide for detailed setup instructions.

Basic installation steps:

# Clone the repository
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git BBoxMaskPose/
cd BBoxMaskPose

# Install your version of torch, torchvision, OpenCV and NumPy
pip install torch==2.1.2+cu121 torchvision==0.16.2+cu121 --extra-index-url https://download.pytorch.org/whl/cu121
pip install numpy==1.25.1 opencv-python==4.9.0.80

# Install MMLibrary
pip install -U openmim
mim install mmengine "mmcv==2.1.0" "mmdet==3.3.0" "mmpretrain==1.2.0"

# Install dependencies
pip install -r requirements.txt
pip install -e .

🎮 Demo

Step 1: Download SAM2 weights using the enclosed script.

Step 2: Run the full BBox-Mask-Pose pipeline on an input image:

python demo/bmp_demo.py configs/bmp_D3.yaml data/004806.jpg

It will take an image 004806.jpg from OCHuman and run (1) detector, (2) pose estimator and (3) SAM2 refinement. Details are in the cofiguration file bmp_D3.yaml.

Options:

configs/bmp_D3.yaml: BMP configuration file
data/004806.jpg: Input image
--device: (Optional) Inference device (default: cuda:0)
--output-root: (Optional) Directory to save outputs (default: demo/outputs)
--create-gif: (Optional) Generate an animated GIF of all iterations (default False)

After running, outputs are in outputs/004806/. The expected output should look like this:

📦 Pre-trained Models

Pre-trained models are available on VRG Hugging Face 🤗. To run the demo, you only need do download SAM weights with enclosed script. Our detector and pose estimator will be downloaded during the runtime.

If you want to download our weights yourself, here are the links to our HuggingFace:

ViTPose-b trained on COCO+MPII+AIC -- download weights
MaskPose-b -- download weights
Fine-tuned RTMDet-L -- download weights

🙏 Acknowledgments

The code combines MMDetection, MMPose 2.0, ViTPose and SAM 2.1.

📝 Citation

The code was implemented by Miroslav Purkrábek. If you use this work, kindly cite it using the reference provided below.

For questions, please use the Issues of Discussion.

@InProceedings{Purkrabek2025ICCV,
    author    = {Purkrabek, Miroslav and Matas, Jiri},
    title     = {Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {9004-9013}
}

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
configs		configs
demo		demo
docker		docker
images		images
mmpose		mmpose
models/SAM		models/SAM
requirements		requirements
sam2		sam2
.dockerignore		.dockerignore
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup.py		setup.py
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle

ICCV 2025

📋 Overview

📢 News

🚀 Installation

Docker Installation (Recommended)

Manual Installation

🎮 Demo

📦 Pre-trained Models

🙏 Acknowledgments

📝 Citation

About

Uh oh!

Releases 1

Uh oh!

Contributors 3

Uh oh!

Languages

License

MiraPurkrabek/BBoxMaskPose

Folders and files

Latest commit

History

Repository files navigation

Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle

ICCV 2025

📋 Overview

📢 News

🚀 Installation

Docker Installation (Recommended)

Manual Installation

🎮 Demo

📦 Pre-trained Models

🙏 Acknowledgments

📝 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors 3

Uh oh!

Languages