FCViT

Solving Jigsaw Puzzles by Predicting Fragment’s Coordinate Based on Vision Transformer

Garam Kim^a, Hyeonseong Cho^a *, Hyoungsik Nam^a *

^a Kyung Hee University, Republic of Korea

(*) Corresponding Authors

ESWA 2025

FCViT: Fragment’s Coordinate prediction Vision Transformer

This repository contains PyTorch training code and evaluation code for FCViT.
Architecture of FCViT:
For details see Solving Jigsaw Puzzles by Predicting Fragment’s Coordinate Based on Vision Transformer by Garam Kim, Hyeonseong Cho and Hyoungsik Nam.
If you use this code for a paper please cite:

@article{kim2025solving,
title={Solving jigsaw puzzles by predicting fragment’s coordinate based on vision transformer},
author={Kim, Garam and Cho, Hyeonseong and Nam, Hyoungsik},
journal={Expert Systems with Applications},
volume={272},
pages={126776},
year={2025},
publisher={Elsevier}
}

Catalog

Overview of paper

Usage

First, clone the repository locally:

git clone https://github.com/HiMyNameIsDavidKim/fcvit.git

Then, install PyTorch and torchvision and timm==0.4.12:

conda install -c pytorch pytorch torchvision
pip install timm==0.4.12

Data preparation

Download ImageNet train and val images from http://image-net.org/.
The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val/ folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg

Evaluation code

To evaluate a FCViT-base on ImageNet val with a GPU:

python main_eval.py \
--eval \
--backbone vit_base_patch16_224 \
--size_puzzle 225 \
--size_fragment 75 \
--num_fragment 9 \
--batch_size 64 \
--resume FCViT_base_3x3_ep100_lr3e-05_b64.pt \
--data_path ${IMAGENET_DIR}

Training code

To train FCViT-base on ImageNet on a GPU for 100 epochs run:

python main_train.py \
--backbone vit_base_patch16_224 \
--size_puzzle 225 \
--size_fragment 75 \
--num_fragment 9 \
--lr 3e-05 \
--epochs 100 \
--weight_decay 0.05 \
--batch_size 64 \
--data_path ${IMAGENET_DIR} \
--output_dir ${SAVE_DIR}

Acknowledgments

Our codebase is mainly based on JigsawCFN, MAE, ViT and timm.
This work was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2019R1F1A1061114) and the Brain Korea 21 Four Program in 2022.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
util		util
.gitignore		.gitignore
OVERVIEW_ENG.md		OVERVIEW_ENG.md
OVERVIEW_KOR.md		OVERVIEW_KOR.md
README.md		README.md
engine_eval.py		engine_eval.py
engine_train.py		engine_train.py
main_eval.py		main_eval.py
main_train.py		main_train.py
puzzle_fcvit.py		puzzle_fcvit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FCViT

Solving Jigsaw Puzzles by Predicting Fragment’s Coordinate Based on Vision Transformer

FCViT: Fragment’s Coordinate prediction Vision Transformer

Catalog

Overview of paper

Usage

Data preparation

Evaluation code

Training code

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

HiMyNameIsDavidKim/fcvit

Folders and files

Latest commit

History

Repository files navigation

FCViT

Solving Jigsaw Puzzles by Predicting Fragment’s Coordinate Based on Vision Transformer

FCViT: Fragment’s Coordinate prediction Vision Transformer

Catalog

Overview of paper

Usage

Data preparation

Evaluation code

Training code

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages