Skip to content

[ICCV 25] The official repository of paper 'Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle'

License

Notifications You must be signed in to change notification settings

MiraPurkrabek/BBoxMaskPose

Repository files navigation

    Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle

    ICCV 2025

BBox-Mask-Pose loop

Paper     Website     License     Video

Papers with code:

2D Pose AP on OCHuman: 42.5    Human Instance Segmentation AP on OCHuman: 34.0

Important

The new version of BBox-Mask-Pose (BMPv2) is now available on arXiv. BMPv2 significantly improves performance; see the quantitative results reported in the preprint. One of the key contributions is PMPose, a new top-down pose estimation model, that is already strong on standard benchmarks and in crowded scenes. The code will be added to the BMP-v2 branch in the following weeks and gradually merged into main as well as to the online demo.

๐Ÿ“‹ Overview

The BBox-Mask-Pose (BMP) method integrates detection, pose estimation, and segmentation into a self-improving loop by conditioning these tasks on each other. This approach enhances all three tasks simultaneously. Using segmentation masks instead of bounding boxes improves performance in crowded scenarios, making top-down methods competitive with bottom-up approaches.

Key contributions:

  1. MaskPose: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
    • Download pre-trained weights below
  2. BBox-MaskPose (BMP): method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation
    • Try the demo!
  3. Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes')
    • Download pre-trained weights below
  4. Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose.

For more details, please visit our project website.

๐Ÿ“ข News

  • Aug 2025: HuggingFace Image Demo is out! ๐ŸŽฎ
  • Jul 2025: Version 1.1 with easy-to-run image demo released
  • Jun 2025: Paper accepted to ICCV 2025! ๐ŸŽ‰
  • Dec 2024: The code is available
  • Nov 2024: The project website is on

๐Ÿš€ Installation

Docker Installation (Recommended)

The fastest way to get started with GPU support:

# Clone and build
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git
cd BBoxMaskPose
docker-compose build

# Run the demo
docker-compose up

Requires: Docker Engine 19.03+, NVIDIA Container Toolkit, NVIDIA GPU with CUDA 12.1 support.

Manual Installation

This project is built on top of MMPose and SAM 2.1. Please refer to the MMPose installation guide or SAM installation guide for detailed setup instructions.

Basic installation steps:

# Clone the repository
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git BBoxMaskPose/
cd BBoxMaskPose

# Install your version of torch, torchvision, OpenCV and NumPy
pip install torch==2.1.2+cu121 torchvision==0.16.2+cu121 --extra-index-url https://download.pytorch.org/whl/cu121
pip install numpy==1.25.1 opencv-python==4.9.0.80

# Install MMLibrary
pip install -U openmim
mim install mmengine "mmcv==2.1.0" "mmdet==3.3.0" "mmpretrain==1.2.0"

# Install dependencies
pip install -r requirements.txt
pip install -e .

๐ŸŽฎ Demo

Step 1: Download SAM2 weights using the enclosed script.

Step 2: Run the full BBox-Mask-Pose pipeline on an input image:

python demo/bmp_demo.py configs/bmp_D3.yaml data/004806.jpg

It will take an image 004806.jpg from OCHuman and run (1) detector, (2) pose estimator and (3) SAM2 refinement. Details are in the cofiguration file bmp_D3.yaml.

Options:

  • configs/bmp_D3.yaml: BMP configuration file
  • data/004806.jpg: Input image
  • --device: (Optional) Inference device (default: cuda:0)
  • --output-root: (Optional) Directory to save outputs (default: demo/outputs)
  • --create-gif: (Optional) Generate an animated GIF of all iterations (default False)

After running, outputs are in outputs/004806/. The expected output should look like this:

Detection results      Pose results

๐Ÿ“ฆ Pre-trained Models

Pre-trained models are available on VRG Hugging Face ๐Ÿค—. To run the demo, you only need do download SAM weights with enclosed script. Our detector and pose estimator will be downloaded during the runtime.

If you want to download our weights yourself, here are the links to our HuggingFace:

๐Ÿ™ Acknowledgments

The code combines MMDetection, MMPose 2.0, ViTPose and SAM 2.1.

๐Ÿ“ Citation

The code was implemented by Miroslav Purkrรกbek. If you use this work, kindly cite it using the reference provided below.

For questions, please use the Issues of Discussion.

@InProceedings{Purkrabek2025ICCV,
    author    = {Purkrabek, Miroslav and Matas, Jiri},
    title     = {Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {9004-9013}
}

About

[ICCV 25] The official repository of paper 'Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle'

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •