Demo.mp4
This project provides a unified Gradio interface that supports both:
- Multi-Domain Image Classification using pre-trained specialized models
- Zero-Shot Image Classification using Google's SigLIP and SigLIP2 models
It enables users to perform image classification across a wide variety of tasks and also test open-vocabulary classification via a simple web interface.
Supports 18+ domains, including:
- Age classification
- Gender classification
- Emotion detection
- Deepfake quality assessment
- Dog breed classification
- Waste classification
- Food classification (Indian/Western)
- Traffic density
- Leaf disease detection (rice)
- Alphabet sign language detection
- Gym workout pose classification
- Bird species classification
- Clip Art 126, Painting 126, Sketch 126
- MNIST and Fashion MNIST
- Multi-source 121 classification
Each model is integrated via its own module for efficient classification using domain-specific architectures.
Uses two open-vocabulary models:
google/siglip-so400m-patch14-384
google/siglip2-so400m-patch14-384
How it works:
- Accepts a user-uploaded image
- Takes a comma-separated list of labels
- Compares prediction probabilities using both SigLIP and SigLIP2 models
git clone https://github.com/PRITHIVSAKTHIUR/SigLIP2-MultiDomain-App.git
cd SigLIP2-MultiDomain-App
You can install required packages using pip:
pip install -r requirements.txt
Minimal requirements.txt
might include:
torch
transformers
gradio
Pillow
python app.py
This will launch a Gradio interface in your default web browser.
- Use the sidebar to choose your desired classification task (e.g., Age, Dog Breed, Deepfake).
- Upload an image.
- Click "Classify / Predict" to see the result.
- Upload an image.
- Enter labels separated by commas (e.g., "cat, dog, horse").
- Click "Run" to compare SigLIP and SigLIP2 outputs.
├── app.py # Main Gradio app
├── gender_classification.py # Domain model modules
├── emotion_classification.py
├── dog_breed.py
├── deepfake_quality.py
├── ...
├── sketch_126.py
└── requirements.txt
- Domain-specific models (e.g., CNNs, ViTs) hosted locally or via Hugging Face
- SigLIP / SigLIP2 models from Google (used for zero-shot inference)
This project is licensed under the MIT License. See the LICENSE file for details.