Cloud Tensor Processing Units (TPUs)
Accelerate AI development and productionization with Google Cloud TPUs
Not sure if TPUs are the right fit? Learn about when to use GPUs or CPUs on Compute Engine instances to run your machine learning workloads.
Overview
What is a Tensor Processing Unit (TPU)?
How do Tensor Processing Units (TPUs) work?
When to use TPUs?
What are the advantages of TPUs?
- Faster time-to-value
Cloud TPUs accelerate training and inference of large neural network models without sacrificing model performance.
- Native support for AI frameworks
Build ML models on Cloud TPUs using popular ML frameworks, including TensorFlow, PyTorch, and JAX. See quickstarts.
- Cost-efficient
Cloud TPUs feature matrix multiply units (MXU) that accelerate matrix operations commonly found in deep learning models. Realize savings when combined with preemptible Cloud TPU pricing.
How are TPUs different from GPUs?
A GPU is a specialized processor originally designed for manipulating computer graphics. Their parallel structure makes them ideal for algorithms that process large blocks of data commonly found in ML workloads. Learn more.
A TPU is an application-specific integrated circuit (ASIC) designed by Google for neural networks. TPUs possess specialized features, such as the matrix multiply unit (MXU) and proprietary interconnect topology that make them ideal for accelerating AI training and inference.
How are TPUs different from CPUs?
A CPU is a computer's main processor. CPUs are general-purpose and suited to run a wide variety of applications. CPUs are based on the von Neumann architecture. The greatest benefit is flexibility. Learn more.
A TPU is an application-specific integrated circuit (ASIC) designed by Google for neural networks. TPUs possess specialized features, such as the matrix multiply unit (MXU) and proprietary interconnect topology that make them ideal for accelerating AI training and inference.
How It Works
Cloud TPUs are ASICs (application-specific integrated circuits) optimized to process large matrix calculations frequently found in neural network workloads.
At the heart of each TPU is a matrix multiply unit (MXU), which features thousands of multiply-accumulators connected to form a systolic array architecture.
TPUs can be configured into Pods. The latest v4 TPU slices within a Pod are connected via a custom interconnect that uses a 3D mesh topology.
Common Uses
Train custom LLMs
Use TPUs (Tensor Processing Units) to train LLMs at scale
Cloud TPUs are Google’s warehouse scale supercomputers for machine learning. They are optimized for performance and scalability while minimizing the total cost of ownership and are ideally suited for training LLMs and generative AI models.
With the fastest training times on five MLPerf 2.0 benchmarks, Cloud TPU v4 Pods are the latest generation of accelerators, forming the world's largest publicly available ML hub with up to 9 exaflops of peak aggregate performance.
Talk to a specialist about using Cloud TPUsCloud TPU v4 helps sustainable agriculture
AI company InstaDeep successfully trained a large AI model with more than 20 billion parameters using Cloud TPU v4 to identify which genes make some crops more nutritious, more efficient to grow, and more resilient and resistant to pests, disease, and drought.
The TPU’s cost-effective inter- and intra-communication capabilities enabled an almost linear scaling between the number of chips and training time, allowing a quick and efficient model training on a grid of 1024 TPU v4 cores (512 chips).
Read the full case study for technical details
