Cloud TPU tutorials, quickstarts and docs

Cloud Tensor Processing Units (TPUs)

Accelerate AI development and productionization with Google Cloud TPUs

Cloud TPUs optimize performance and cost for all AI workloads, from training to inference. Using world-class data center infrastructure, TPUs offer high reliability, availability, and security.

Try it free Contact sales Go to console Contact sales

Not sure if TPUs are the right fit? Learn about when to use GPUs or CPUs on Compute Engine instances to run your machine learning workloads.

Overview

What is a Tensor Processing Unit (TPU)?

Cloud TPU is a Google Cloud service that provides access to custom-designed machine learning accelerators called Tensor Processing Units (TPUs). TPUs are optimized to accelerate the training and inference of machine learning models, making them ideal for a variety of applications, including natural language processing, computer vision, and speech recognition. Learn about Cloud TPU System Architecture here.

How do Tensor Processing Units (TPUs) work?

Cloud TPUs are specialized hardware that accelerate ML workloads at scale. Features, such as the matrix multiply unit (MXU) optimize large matrix operations, while optical circuit switch (OCS) technology enables high-bandwidth memory (HBM). They can be connected into groups called Pods that scale up and accelerate ML training and inference. Cloud TPUs offer high performance, ease of development, and cost efficiency in terms of ML development and productionization.

When to use TPUs?

Cloud TPUs are optimized for training large and complex deep learning models that feature many matrix calculations, for instance building large language models (LLMs). Cloud TPUs also have SparseCores, which are dataflow processors that accelerate models relying on embeddings found in recommendation models. Other use cases include healthcare, like protein folding modeling and drug discovery. Cloud TPUs offer high performance-to-cost, which saves businesses money and time when realizing AI workloads.

What are the advantages of TPUs?

- Faster time-to-value

Cloud TPUs accelerate training and inference of large neural network models without sacrificing model performance.

- Native support for AI frameworks

Build ML models on Cloud TPUs using popular ML frameworks, including TensorFlow, PyTorch, and JAX. See quickstarts.

- Cost-efficient

Cloud TPUs feature matrix multiply units (MXU) that accelerate matrix operations commonly found in deep learning models. Realize savings when combined with preemptible Cloud TPU pricing.

How are TPUs different from GPUs?

A GPU is a specialized processor originally designed for manipulating computer graphics. Their parallel structure makes them ideal for algorithms that process large blocks of data commonly found in ML workloads. Learn more.

A TPU is an application-specific integrated circuit (ASIC) designed by Google for neural networks. TPUs possess specialized features, such as the matrix multiply unit (MXU) and proprietary interconnect topology that make them ideal for accelerating AI training and inference.

How are TPUs different from CPUs?

A CPU is a computer's main processor. CPUs are general-purpose and suited to run a wide variety of applications. CPUs are based on the von Neumann architecture. The greatest benefit is flexibility. Learn more.

How It Works

Cloud TPUs are ASICs (application-specific integrated circuits) optimized to process large matrix calculations frequently found in neural network workloads.

At the heart of each TPU is a matrix multiply unit (MXU), which features thousands of multiply-accumulators connected to form a systolic array architecture.

TPUs can be configured into Pods. The latest v4 TPU slices within a Pod are connected via a custom interconnect that uses a 3D mesh topology.

gif showing what TPU architecture looks like

Common Uses

Train custom LLMs

Cloud TPU v4 helps sustainable agriculture

AI company InstaDeep successfully trained a large AI model with more than 20 billion parameters using Cloud TPU v4 to identify which genes make some crops more nutritious, more efficient to grow, and more resilient and resistant to pests, disease, and drought.

The TPU’s cost-effective inter- and intra-communication capabilities enabled an almost linear scaling between the number of chips and training time, allowing a quick and efficient model training on a grid of 1024 TPU v4 cores (512 chips).

Read the full case study for technical details