Important
The new version of BBox-Mask-Pose (BMPv2) is now available on arXiv.
BMPv2 significantly improves performance; see the quantitative results reported in the preprint.
One of the key contributions is PMPose, a new top-down pose estimation model, that is already strong on standard benchmarks and in crowded scenes.
The code will be added to the BMP-v2 branch in the following weeks and gradually merged into main as well as to the online demo.
The BBox-Mask-Pose (BMP) method integrates detection, pose estimation, and segmentation into a self-improving loop by conditioning these tasks on each other. This approach enhances all three tasks simultaneously. Using segmentation masks instead of bounding boxes improves performance in crowded scenarios, making top-down methods competitive with bottom-up approaches.
Key contributions:
- MaskPose: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
- Download pre-trained weights below
- BBox-MaskPose (BMP): method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation
- Try the demo!
- Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes')
- Download pre-trained weights below
- Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose.
For more details, please visit our project website.
- Aug 2025: HuggingFace Image Demo is out! ๐ฎ
- Jul 2025: Version 1.1 with easy-to-run image demo released
- Jun 2025: Paper accepted to ICCV 2025! ๐
- Dec 2024: The code is available
- Nov 2024: The project website is on
The fastest way to get started with GPU support:
# Clone and build
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git
cd BBoxMaskPose
docker-compose build
# Run the demo
docker-compose upRequires: Docker Engine 19.03+, NVIDIA Container Toolkit, NVIDIA GPU with CUDA 12.1 support.
This project is built on top of MMPose and SAM 2.1. Please refer to the MMPose installation guide or SAM installation guide for detailed setup instructions.
Basic installation steps:
# Clone the repository
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git BBoxMaskPose/
cd BBoxMaskPose
# Install your version of torch, torchvision, OpenCV and NumPy
pip install torch==2.1.2+cu121 torchvision==0.16.2+cu121 --extra-index-url https://download.pytorch.org/whl/cu121
pip install numpy==1.25.1 opencv-python==4.9.0.80
# Install MMLibrary
pip install -U openmim
mim install mmengine "mmcv==2.1.0" "mmdet==3.3.0" "mmpretrain==1.2.0"
# Install dependencies
pip install -r requirements.txt
pip install -e .Step 1: Download SAM2 weights using the enclosed script.
Step 2: Run the full BBox-Mask-Pose pipeline on an input image:
python demo/bmp_demo.py configs/bmp_D3.yaml data/004806.jpgIt will take an image 004806.jpg from OCHuman and run (1) detector, (2) pose estimator and (3) SAM2 refinement. Details are in the cofiguration file bmp_D3.yaml.
Options:
configs/bmp_D3.yaml: BMP configuration filedata/004806.jpg: Input image--device: (Optional) Inference device (default:cuda:0)--output-root: (Optional) Directory to save outputs (default:demo/outputs)--create-gif: (Optional) Generate an animated GIF of all iterations (defaultFalse)
After running, outputs are in outputs/004806/. The expected output should look like this:
Pre-trained models are available on VRG Hugging Face ๐ค. To run the demo, you only need do download SAM weights with enclosed script. Our detector and pose estimator will be downloaded during the runtime.
If you want to download our weights yourself, here are the links to our HuggingFace:
- ViTPose-b trained on COCO+MPII+AIC -- download weights
- MaskPose-b -- download weights
- Fine-tuned RTMDet-L -- download weights
The code combines MMDetection, MMPose 2.0, ViTPose and SAM 2.1.
The code was implemented by Miroslav Purkrรกbek. If you use this work, kindly cite it using the reference provided below.
For questions, please use the Issues of Discussion.
@InProceedings{Purkrabek2025ICCV,
author = {Purkrabek, Miroslav and Matas, Jiri},
title = {Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {9004-9013}
}
