From the course: TensorFlow: Working with NLP
Unlock this course with a free trial
Join today to access over 25,600 courses taught by industry experts.
Multi-head attention and feedforward network
From the course: TensorFlow: Working with NLP
Multi-head attention and feedforward network
- [Instructor] Earlier, we looked at how self-attention can help us provide context for a word, but what if we could get multiple instances of the self-attention mechanism so that each can perform a different task? One could make a link between nouns and adjectives, another could connect up pronouns to their subjects. This is called multi-headed attention, and BERT has 12 such heads. Each multi-head attention block gets three inputs, the query, the key, and the value. These are then put through linear or dense layers before the multi-head attention function. The query key and value are then passed through separate, fully-connected linear layers for each attention head. This model can jointly attend to information from different representations and at different positions, allowing it to make richer connections between words.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.