LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
-
Updated
Apr 24, 2025 - Python
LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
deepfake-detector-model-v1 is a vision-language encoder model fine-tuned from siglip2-base-patch16-512 for binary deepfake image classification. It is trained to detect whether an image is real or generated using synthetic media techniques. The model uses the SiglipForImageClassification architecture.
Facial-Emotion-Detection-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
Age-Classification-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to predict the age group of a person from an image using the SiglipForImageClassification architecture.
Watermark-Detection-SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for binary image classification. It is trained to detect whether an image contains a watermark or not, using the SiglipForImageClassification architecture.
Augmented-Waste-Classifier-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
siglip2-mini-explicit-content is an image classification vision-language encoder model fine-tuned from siglip2-base-patch16-512 for a single-label classification task. It is designed to classify images into categories related to explicit, sensual, or safe-for-work content using the SiglipForImageClassification architecture.
Human-Action-Recognition is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class human action recognition. It uses the SiglipForImageClassification architecture to predict human activities from still images.
SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
Anime-Classification-v1.0 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify anime-related images using the SiglipForImageClassification architecture.
nsfw-image-detection is a vision-language encoder model fine-tuned from siglip2-base-patch16-256 for multi-class image classification. Built on the SiglipForImageClassification architecture, the model is trained to identify and categorize content types in images, especially for explicit, suggestive, or safe media filtering.
x-bot-profile-detection is a SigLIP2-based classification model designed to detect profile authenticity types on social media platforms (such as X/Twitter). It categorizes a profile image into four classes: bot, cyborg, real, or verified. Built on google/siglip2-base-patch16-224.
classify handwritten digits (0-9)
Fire-Detection-Siglip2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to detect fire, smoke, or normal conditions using the SiglipForImageClassification architecture.
Fashion-Mnist-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images into Fashion-MNIST categories using the SiglipForImageClassification architecture.
Multilabel-GeoSceneNet is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-label image classification. It is designed to recognize and label multiple geographic or environmental elements in a single image using the SiglipForImageClassification architecture.
Deepfake vs Real is a dataset designed for image classification, distinguishing between deepfake and real images.
open-deepfake-detection is a vision-language encoder model fine-tuned from siglip2-base-patch16-512 for binary image classification. It is trained to detect whether an image is fake or real using the OpenDeepfake-Preview dataset. The model uses the SiglipForImageClassification architecture.
Face-Mask-Detection is a binary image classification model based on google/siglip2-base-patch16-224, trained to detect whether a person is wearing a face mask or not. This model can be used in public health monitoring, access control systems, and workplace compliance enforcement.
Food-101-93M is a fine-tuned image classification model built on top of google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It is trained to classify food images into one of 101 popular dishes, derived from the Food-101 dataset.
Add a description, image, and links to the siglip2 topic page so that developers can more easily learn about it.
To associate your repository with the siglip2 topic, visit your repo's landing page and select "manage topics."