Multi-head attention and feedforward network

From the course: TensorFlow: Working with NLP

Start my 1-month free trial Buy for my team

Multi-head attention and feedforward network

“

- [Instructor] Earlier, we looked at how self-attention can help us provide context for a word, but what if we could get multiple instances of the self-attention mechanism so that each can perform a different task? One could make a link between nouns and adjectives, another could connect up pronouns to their subjects. This is called multi-headed attention, and BERT has 12 such heads. Each multi-head attention block gets three inputs, the query, the key, and the value. These are then put through linear or dense layers before the multi-head attention function. The query key and value are then passed through separate, fully-connected linear layers for each attention head. This model can jointly attend to information from different representations and at different positions, allowing it to make richer connections between words.

- (Locked)
  
  Next steps
  
  47s

Unlock this course with a free trial

Join today to access over 25,600 courses taught by industry experts.

Multi-head attention and feedforward network

From the course: TensorFlow: Working with NLP

Multi-head attention and feedforward network

Practice while you learn with exercise files

Download courses and learn on the go

Contents

Start learning today.

Explore Business Topics

Explore Creative Topics

Explore Technology Topics