Why we need more than CNNs

From the course: Artificial Intelligence Foundations: Neural Networks

Start my 1-month free trial Buy for my team

Why we need more than CNNs

“

Earlier, we explored how convolutional neural networks became the backbone of computer vision. In this video, we'll explore how and why the field is shifting from CNNs toward transformer-based architectures for computer vision. Some visual tasks require understanding relationships between distant regions. For example, matching a logo on one side of an image with text on the other, or recognizing that two far apart objects belong to the same scene. CNNs can only compare nearby pixels, and they require many stacked layers to connect distant regions. This makes them slower, harder to train, and sometimes less accurate for global reasoning. This raises an important question. What if we could see everything at once? What if every part of the image could directly attend to every other part simultaneously? This example shows two cars for a part in a parking lot scene. The cars are spatially distant. CNNs would need many…

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Why we need more than CNNs

From the course: Artificial Intelligence Foundations: Neural Networks

Why we need more than CNNs

Download courses and learn on the go

Contents

Start learning today.

Explore Business Topics

Explore Creative Topics

Explore Technology Topics