Garam Kima, Hyeonseong Choa *, Hyoungsik Nama *
a Kyung Hee University, Republic of Korea
(*) Corresponding Authors
ESWA 2025
- This repository contains PyTorch training code and evaluation code for FCViT.
- Architecture of FCViT:

- For details see Solving Jigsaw Puzzles by Predicting Fragment’s Coordinate Based on Vision Transformer by Garam Kim, Hyeonseong Cho and Hyoungsik Nam.
- If you use this code for a paper please cite:
-
@article{kim2025solving, title={Solving jigsaw puzzles by predicting fragment’s coordinate based on vision transformer}, author={Kim, Garam and Cho, Hyeonseong and Nam, Hyoungsik}, journal={Expert Systems with Applications}, volume={272}, pages={126776}, year={2025}, publisher={Elsevier} }
- Overview of paper
- Usage
- Data preparation
- Evaluation code
- Training code
- License
- First, clone the repository locally:
-
git clone https://github.com/HiMyNameIsDavidKim/fcvit.git - Then, install PyTorch and torchvision and
timm==0.4.12: -
conda install -c pytorch pytorch torchvision pip install timm==0.4.12
- Download ImageNet train and val images from http://image-net.org/.
- The directory structure is the standard layout for the torchvision
datasets.ImageFolder, and the training and validation data is expected to be in thetrain/folder andval/folder respectively: -
/path/to/imagenet/ train/ class1/ img1.jpeg class2/ img2.jpeg val/ class1/ img3.jpeg class2/ img4.jpeg
- To evaluate a FCViT-base on ImageNet val with a GPU:
-
python main_eval.py \ --eval \ --backbone vit_base_patch16_224 \ --size_puzzle 225 \ --size_fragment 75 \ --num_fragment 9 \ --batch_size 64 \ --resume FCViT_base_3x3_ep100_lr3e-05_b64.pt \ --data_path ${IMAGENET_DIR}
- To train FCViT-base on ImageNet on a GPU for 100 epochs run:
-
python main_train.py \ --backbone vit_base_patch16_224 \ --size_puzzle 225 \ --size_fragment 75 \ --num_fragment 9 \ --lr 3e-05 \ --epochs 100 \ --weight_decay 0.05 \ --batch_size 64 \ --data_path ${IMAGENET_DIR} \ --output_dir ${SAVE_DIR}
- Our codebase is mainly based on JigsawCFN, MAE, ViT and timm.
- This work was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2019R1F1A1061114) and the Brain Korea 21 Four Program in 2022.