TagScribeR v2 is a modern, GPU-accelerated local image captioning and dataset management suite. Rebuilt from the ground up using PySide6 and powered by Qwen 3-VL (Vision-Language) models(with optional API support), it offers a "Studio" workflow for preparing AI training datasets.
- πΌοΈ Gallery Studio: Multi-select visual grid, instant tagging, and batch caption editing.
- π€ Qwen 3-VL Captioning: State-of-the-art vision model integration.
- GPU Accelerated: Supports NVIDIA (CUDA) and AMD (ROCm) on Windows.
- Real-time Preview: Watch captions appear as they generate.
- Custom Prompts: Use templates or natural language (e.g., "Describe the lighting in detail").
- API Mode: Connect to LM Studio, Ollama, or other API services to use other desired models or offload processing to another machine or the cloud.
- βοΈ Batch Editor: Resize, Crop (with focus points), Rotate, and Convert formats in bulk.
- π Dataset Manager: Create, sort, filter, and organize image collections without duplicating files manually.
- βΉοΈ Metadata Editor: View and edit EXIF data, specifically targeting Stable Diffusion generation parameters.
- Python 3.10 or 3.11 installed.
- Git installed.
Clone the repository and run the installer:
git clone https://github.com/ArchAngelAries/TagScribeR.git
cd TagScribeR
install.batThe installer will automatically detect your hardware:
- NVIDIA RTX 20/30/40: Installs Stable CUDA 12.4.
- NVIDIA RTX 50 (Blackwell): Installs Nightly CUDA 12.8 (cu128).
- AMD Radeon: Scans for your architecture (RX 7000, RX 9000, Strix Halo) and installs the correct ROCm Nightly build.
If the auto-installer fails or you need a specific version, activate the venv (.\venv\Scripts\activate) and run the command for your hardware:
Standard (RTX 30/40):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124Bleeding Edge (RTX 50 Series):
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128Find your architecture below. You must run both commands (SDK + Torch).
RX 7000 Series / 780M (gfx110X):
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx110X-all/ "rocm[libraries,devel]"
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx110X-all/ --pre torch torchvision torchaudioRX 9000 Series (gfx120X):
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ "rocm[libraries,devel]"
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ --pre torch torchvision torchaudioStrix Halo (gfx1151):
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ "rocm[libraries,devel]"
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchvision torchaudioWorkstation MI300 (gfx94X):
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx94X-dcgpu/ "rocm[libraries,devel]"
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx94X-dcgpu/ --pre torch torchvision torchaudio
β οΈ Important: Do not runpip install torchafterwards, or it will overwrite the AMD version with the generic CPU version.
TagScribeR works natively on Linux with full GPU acceleration.
- Run the install script:
chmod +x install.sh ./install.sh
- If the app fails to launch, you may need the Qt XCB library:
sudo apt-get install libxcb-cursor0
- Launch:
source venv/bin/activate python main.py
If you have an RTX 5090 and get an error saying RuntimeError: operator torchvision::nms does not exist or no kernel image available:
This means the PyTorch Nightly server has mismatched versions (a common issue on the bleeding edge).
The Fix (Manual Transplant):
- If you have Fluxgym, Kohya_SS, or OneTrainer running successfully on your machine, go to that application's
venv\Lib\site-packagesfolder. - Copy the folders
torch,torchvision, andtorchaudio. - Paste them into
TagScribeR\venv\Lib\site-packages, overwriting files. - Run this in the TagScribeR terminal to fix dependencies:
.\venv\Scripts\activate pip install torchgen sympy networkx jinja2 fsspec pyyaml
- Go to the Auto Caption tab.
- Download a Model: Select a preset (e.g., Qwen 2.5-VL-3B) and click Download.
- Load Images: Open a folder containing your dataset.
- Select Images: Click individual images or "Select All".
- Run: Click "π Caption Selected".
- Go to the Datasets tab.
- Create Collection: Click "New" to create a named folder in
Dataset Collections. - Filter Source: Load a source folder and type tags (e.g.,
1girl,outdoors) to find specific images. - Add: Select the images and click "β Add Selected to Collection". This copies the images and their text files safely.
- Themes: Choose from various Material Design themes (Dark Teal, Dark Amber, Light Blue, etc.).
- Defaults: Set your preferred AI temperature and token limits.
- GUI Framework: PySide6 & qt-material
- AI Backend: HuggingFace Transformers & Qwen-VL
- AMD Support: ROCm for Windows
Created by ArchAngelAries. Code Assisted by Google's Gemini Pro 3.