
Accelerate post-training of end-to-end autonomous vehicle stacks with vector search and retrieval for large video datasets.

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

Securely extract, embed, and index multimodal data with encryption in-use for fast, accurate semantic search.

Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.

Elevate Shopping Experiences Online and In Stores.

Stable Diffusion 3.5 is a popular text-to-image generation model

FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

Multimodal question-answer retrieval representing user queries as text and documents as images.

Build advanced AI agents within the biomedical domain using the AI-Q Blueprint and the BioNeMo Virtual Screening Blueprint

Multi-modal vision-language model that understands text/img and creates informative responses

FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds

Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses

FLUX.1 is a state-of-the-art suite of image generation models

Build a custom enterprise research assistant powered by state-of-the-art models that process and synthesize multimodal data, enabling reasoning, planning, and refinement to generate comprehensive reports.

Generate exponentially large amounts of synthetic motion trajectories for robot manipulation from just a few human demonstrations.

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Power fast, accurate semantic search across multimodal enterprise data with NVIDIA’s RAG Blueprint—built on NeMo Retriever and Nemotron models—to connect your agents to trusted, authoritative sources of knowledge.

Multi-modal vision-language model that understands text/img/video and creates informative responses

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.

Advanced AI model detects faces and identifies deep fake images.

Create intelligent virtual assistants for customer service across every industry

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

Robust image classification model for detecting and managing AI-generated content.

English text embedding model for question-answering retrieval.

Multilingual text question-answering retrieval, transforming textual information into dense vector representations.



Advanced text-to-image model for generating high quality images

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.