Name	Name	Last commit message	Last commit date
Latest commit History 7 Commits
datasets	datasets
images	images
models	models
util	util
.gitignore	.gitignore
README.md	README.md
main.py	main.py
run.sh	run.sh
test.sh	test.sh

Name

Last commit message

Last commit date

CLIP-Activity2Vec: Image-Level PaStaNet with CLIP

We build a CLIP-based Activity2Vec model to use the power of CLIP. Based on the CLIP pre-trained model, we finetune it on our HAKE data with the human body part state (PaSta) labels. Thus, this model would estimate all the existing PaSta within the whole image (like the visualizations below). We believe it would be useful for action understanding as a more powerful image action semantic extractor. If you have any advice, feel free to drop us an email!

Prerequisite

Installation

install PyTorch 1.5+ and torchvision 0.6+:

conda install -c pytorch pytorch torchvision

clone this repository

git clone https://github.com/DirtyHarryLYL/HAKE-Action-Torch.git PaStaNetCLIP
cd PaStaNetCLIP
git checkout CLIP-Activity2Vec

Data Preparation

Download HAKE dataset from here

tar xzvf hake-large.tgz

Organize your data as follows

PaStaNetCLIP
|_ data
    |_ hake
      |_ hake-large
      |  |_ hico-train
      |  |_ hico-test
      |  |_ ...

Download annotations and put them under $PROJECT/data/hake folder.
Download the CLIP pretrained model from Baidu Pan and put it in $PROJECT/pretrained/clip.

Or Google Drive (ViT-B-16.new.pt: original CLIP pre-trianed model, ckpt_4.pth: finetuned model with HAKE data).

Pretrained Model

Backbone: ViT-B/16

data split: train/val

model weights: link

-	mAP
foot	64.6
leg	76.3
hip	64.5
hand	44.7
arm	72.9
head	60.6
binary	81.0
verb	68.1

Usage

Training

  # by default, we use gpu x batch : 8x4
  # you can use --batch_size and --nproc_per_node to adjust the batch size and number of GPUS. 
  ./run.sh

Testing

 # set --vis_test to visualize the predicition
 ./test.sh

Citation

If you find our works useful, please consider citing:

@inproceedings{li2020pastanet,
  title={PaStaNet: Toward Human Activity Knowledge Engine},
  author={Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu and Ma, Ze and Chen, Mingyang and Lu, Cewu},
  booktitle={CVPR},
  year={2020}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CLIP-Activity2Vec: Image-Level PaStaNet with CLIP

Prerequisite

Installation

Data Preparation

Pretrained Model

Usage

Training

Testing

Citation

About

Uh oh!

Releases

Packages

License

DirtyHarryLYL/HAKE-Action-Torch

Folders and files

Latest commit

History

Repository files navigation

CLIP-Activity2Vec: Image-Level PaStaNet with CLIP

Prerequisite

Installation

Data Preparation

Pretrained Model

Usage

Training

Testing

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages