blob: 76bada51b2142933f3f71055830ee8fa5963ba5e [file] [log] [blame] [view]
Klaus Weidnere66cc7d2017-12-09 17:26:301# GPU Synchronization in Chrome
2
3Chrome supports multiple mechanisms for sequencing GPU drawing operations, this
4document provides a brief overview. The main focus is a high-level explanation
5of when synchronization is needed and which mechanism is appropriate.
6
7[TOC]
8
9## Glossary
10
11**GL Sync Object**: Generic GL-level synchronization object that can be in a
12"unsignaled" or "signaled" state. The only current implementation of this is a
13GL fence.
14
15**GL Fence**: A GL sync object that is inserted into the GL command stream. It
16starts out unsignaled and becomes signaled when the GPU reaches this point in the
17command stream, implying that all previous commands have completed.
18
19**Client Wait**: Block the client thread until a sync object becomes signaled,
20or until a timeout occurs.
21
22**Server Wait**: Tells the GPU to defer executing commands issued after a fence
23until the fence signals. The client thread continues executing immediately and
24can continue submitting GL commands.
25
26**CHROMIUM fence sync**: A command buffer specific GL fence that sequences
27operations among command buffer GL contexts without requiring driver-level
28execution of previous commands.
29
30**Native GL Fence**: A GL Fence backed by a platform-specific cross-process
31synchronization mechanism.
32
33**GPU Fence Handle**: An IPC-transportable object (typically a file descriptor)
34that can be used to duplicate a native GL fence into a different process's
35context.
36
37**GPU Fence**: A Chrome abstraction that owns a GPU fence handle representing a
38native GL fence, usable for cross-process synchronization.
39
40## Use case overview
41
Quinten Yearsley317532d2021-10-20 17:10:3142The core scenario is synchronizing read and write access to a shared resource,
Klaus Weidnere66cc7d2017-12-09 17:26:3043for example drawing an image into an offscreen texture and compositing the
44result into a final image. The drawing operations need to be completed before
45reading to ensure correct output. A typical effect of wrong synchronization is
46that the output contains blank or incomplete results instead of the expected
47rendered sub-images, causing flickering or tearing.
48
49"Completed" in this case means that the end result of using a resource as input
50will be equivalent to waiting for everything to finish rendering, but it does
51not necessarily mean that the GPU has fully finished all drawing operations at
52that time.
53
54## Single GL context: no synchronization needed
55
56If all access to the shared resource happens in the same GL context, there is no
57need for explicit synchronization. GL guarantees that commands are logically
58processed in the order they are submitted. This is true both for local GL
59contexts (GL calls via ui/gl/ interfaces) and for a single command buffer GL
60context.
61
62## Multiple driver-level GL contexts in the same share group: use GLFence
63
64A process can create multiple GL contexts that are part of the same share group.
65These contexts can be created in different threads within this process.
66
67In this case, GL fences must be used for sequencing, for example:
68
691. Context A: draw image, create GLFence
701. Context B: server wait or client wait for GLFence, read image
71
Xu Xingc5b1b9582018-01-15 04:52:0572[gl::GLFence](/ui/gl/gl_fence.h) and its subclasses provide wrappers for
Klaus Weidnere66cc7d2017-12-09 17:26:3073GL/EGL fence handling methods such as `eglFenceSyncKHR` and `eglWaitSyncKHR`.
74These fence objects can be used cross-thread as long as both thread's GL
75contexts are part of the same share group.
76
77For more details, please refer to the underlying extension documentation, for example:
78
79* https://www.khronos.org/opengl/wiki/Synchronization
80* https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_fence_sync.txt
81* https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_wait_sync.txt
82
83## Implementation-dependent: same-thread driver-level GL contexts
84
85Many GL driver implementations are based on a per-thread command queue,
86with the effect that commands are processed in order even if they were issued
87from different contexts on that thread without explicit synchronization.
88
89This behavior is not part of the GL standard, and some driver implementations
90use a per-context command queue where this assumption is not true.
91
92See [issue 510232](http://crbug.com/510243#c23) for an example of a problematic
93sequence:
94
Klaus Weidnerf11c17752018-01-31 00:17:1995```
96// In one thread:
97MakeCurrent(A);
98Render1();
99MakeCurrent(B);
100Render2();
101CreateSync(X);
102
103// And in another thread:
104MakeCurrent(C);
105WaitSync(X);
106Render3();
107MakeCurrent(D);
108Render4();
109```
110
111The only serialization guarantee is that Render2 will complete before Render3,
112but Render4 could theoretically complete before Render1.
Klaus Weidnere66cc7d2017-12-09 17:26:30113
114Chrome assumes that the render steps happen in order Render1, Render2, Render3,
115and Render4, and requires this behavior to ensure security. If the driver doesn't
116ensure this sequencing, Chrome has to emulate it using virtual contexts. (Or by
117using explicit synchronization, but it doesn't do that today.) See also the
118"CHROMIUM fence sync" section below.
119
Sunny Sachanandanic94b8de2017-12-16 03:30:30120## Command buffer GL clients: use CHROMIUM sync tokens
Klaus Weidnere66cc7d2017-12-09 17:26:30121
122Chrome's command buffer IPC interface uses multiple layers. There are multiple
123active IPC channels (typically one per process, i.e. one per Renderer and one
Sunny Sachanandanic94b8de2017-12-16 03:30:30124for Browser). Each IPC channel has multiple scheduling groups (also called
125streams), and each stream can contain multiple command buffers, which in turn
126contain a sequence of GL commands.
Klaus Weidnere66cc7d2017-12-09 17:26:30127
128Command buffers in the same client-side share group must be in the same stream.
129Command scheduling granuarity is at the stream level, and a client can choose to
130create and use multiple streams with different stream priorities. Stream IDs are
131arbitrary integers assigned by the client at creation time, see for example the
Scott Violet703b8242019-06-11 19:34:36132[viz::ContextProviderCommandBuffer](/services/viz/public/cpp/gpu/context_provider_command_buffer.h)
Klaus Weidnere66cc7d2017-12-09 17:26:30133constructor.
134
Sunny Sachanandanic94b8de2017-12-16 03:30:30135The CHROMIUM sync token is intended to order operations among command buffer GL
136instructions. It inserts an internal fence sync command in the stream, flushing
137it appropriately (see below), and generating a sync token from it which is a
138cross-context transportable reference to the underlying fence sync. A
139WaitSyncTokenCHROMIUM call does **not** ensure that the underlying GL commands
140have been executed at the GPU driver level, this mechanism is not suitable for
141synchronizing command buffer GL operations with a local driver-level GL context.
Klaus Weidnere66cc7d2017-12-09 17:26:30142
143See the
Xu Xingc5b1b9582018-01-15 04:52:05144[CHROMIUM_sync_point](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_sync_point.txt)
Klaus Weidnere66cc7d2017-12-09 17:26:30145documentation for details.
146
147Commands issued within a single command buffer don't need to be synchronized
148explicitly, they will be executed in the same order that they were issued.
149
150Multiple command buffers within the same stream can use an ordering barrier to
151sequence their commands. Sync tokens are not necessary. Example:
152
153```c++
154// Command buffers gl1 and gl2 are in the same stream.
155Render1(gl1);
156gl1->OrderingBarrierCHROMIUM()
157Render2(gl2); // will happen after Render1.
158```
159
160Command buffers that are in different streams need to use sync tokens. If both
161are using the same IPC channel (i.e. same client process), an unverified sync
162token is sufficient, and commands do not need to be flushed to the server:
163
164```c++
165// stream A
166Render1(glA);
Sunny Sachanandanic94b8de2017-12-16 03:30:30167glA->GenUnverifiedSyncTokenCHROMIUM(out_sync_token);
Klaus Weidnere66cc7d2017-12-09 17:26:30168
169// stream B
170glB->WaitSyncTokenCHROMIUM();
171Render2(glB); // will happen after Render1.
172```
173
174Command buffers that are using different IPC channels must use verified sync
175tokens. Verification is a check that the underlying fence sync was flushed to
176the server. Cross-process synchronization always uses verified sync tokens.
177`GenSyncTokenCHROMIUM` will force a shallow flush as a side effect if necessary.
178Example:
179
180```c++
181// IPC channel in process X
182Render1(glX);
Sunny Sachanandanic94b8de2017-12-16 03:30:30183glX->GenSyncTokenCHROMIUM(out_sync_token);
Klaus Weidnere66cc7d2017-12-09 17:26:30184
185// IPC channel in process Y
186glY->WaitSyncTokenCHROMIUM();
187Render2(glY); // will happen after Render1.
188```
189
190Alternatively, unverified sync tokens can be converted to verified ones in bulk
191by calling `VerifySyncTokensCHROMIUM`. This will wait for a flush to complete as
192necessary. Use this to avoid multiple sequential flushes:
193
194```c++
Sunny Sachanandanic94b8de2017-12-16 03:30:30195gl->GenUnverifiedSyncTokenCHROMIUM(out_sync_tokens[0]);
196gl->GenUnverifiedSyncTokenCHROMIUM(out_sync_tokens[1]);
Klaus Weidnere66cc7d2017-12-09 17:26:30197gl->VerifySyncTokensCHROMIUM(out_sync_tokens, 2);
198```
199
200### Implementation notes
201
202Correctness of the CHROMIUM fence sync mechanism depends on the assumption that
203commands issued from the command buffer service side happen in the order they
204were issued in that thread. This is handled in different ways:
205
206* Issue a glFlush on switching contexts on platforms where glFlush is sufficient
207 to ensure ordering, i.e. MacOS. (This approach would not be well suited to
208 tiling GPUs as used on many mobile GPUs where glFlush is an expensive
209 operation, it may force content load/store between tile memory and main
210 memory.) See for example
Xu Xingc5b1b9582018-01-15 04:52:05211 [gl::GLContextCGL::MakeCurrent](/ui/gl/gl_context_cgl.cc):
Klaus Weidnere66cc7d2017-12-09 17:26:30212```c++
213 // It's likely we're going to switch OpenGL contexts at this point.
214 // Before doing so, if there is a current context, flush it. There
215 // are many implicit assumptions of flush ordering between contexts
216 // at higher levels, and if a flush isn't performed, OpenGL commands
217 // may be issued in unexpected orders, causing flickering and other
218 // artifacts.
219```
220
221* Force context virtualization so that all commands are issued into a single
222 driver-level GL context. This is used on Qualcomm/Adreno chipsets, see [issue
223 691102](http://crbug.com/691102).
224
225* Assume per-thread command queues without explicit synchronization. GLX
226 effectively ensures this. On Windows, ANGLE uses a single D3D device
227 underneath all contexts which ensures strong ordering.
228
229GPU control tasks are processed out of band and are only partially ordered in
230respect to GL commands. A gpu_control task always happens before any following
231GL commands issued on the same IPC channel. It usually executes before any
232preceding unflushed GL commands, but this is not guaranteed. A
233`ShallowFlushCHROMIUM` ensures that any following gpu_control tasks will execute
234after the flushed GL commands.
235
236In this example, DoTask will execute after GLCommandA and before GLCommandD, but
237there is no ordering guarantee relative to CommandB and CommandC:
238
239```c++
240 // gles2_implementation.cc
241
242 helper_->GLCommandA();
243 ShallowFlushCHROMIUM();
244
245 helper_->GLCommandB();
246 helper_->GLCommandC();
247 gpu_control_->DoTask();
248
249 helper_->GLCommandD();
250
251 // Execution order is one of:
252 // A | DoTask B C | D
253 // A | B DoTask C | D
254 // A | B C DoTask | D
255```
256
257The shallow flush adds the pending GL commands to the service's task queue, and
258this task queue is also used by incoming gpu control tasks and processed in
259order. The `ShallowFlushCHROMIUM` command returns as soon as the tasks are
260queued and does not wait for them to be processed.
261
262## Cross-process transport: GpuFence and GpuFenceHandle
263
264Some platforms such as Android (most devices N and above) and ChromeOS support
265synchronizing a native GL context with a command buffer GL context through a
266GpuFence.
267
268Use the static `gl::GLFence::IsGpuFenceSupported()` method to check at runtime if
269the current platform has support for the GpuFence mechanism including
270GpuFenceHandle transport.
271
272The GpuFence mechanism supports two use cases:
273
274* Create a GLFence object in a local context, convert it to a client-side
275GpuFence, duplicate it into a command buffer service-side gpu fence, and
276issue a server wait on the command buffer service side. That service-side
277wait will be unblocked when the *client-side* GpuFence signals.
278
279* Create a new command buffer service-side gpu fence, request a GpuFenceHandle
280from it, use this handle to create a native GL fence object in the local
281context, then issue a server wait on the local GL fence object. This local
282server wait will be unblocked when the *service-side* gpu fence signals.
283
284The [CHROMIUM_gpu_fence
Xu Xingc5b1b9582018-01-15 04:52:05285extension](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_gpu_fence.txt) documents
Klaus Weidnere66cc7d2017-12-09 17:26:30286the GLES API as used through the command buffer interface. This section contains
287additional information about the integration with local GL contexts that is
288needed to work with these objects.
289
290### Driver-level wrappers
291
292In general, you should use the static `gl::GLFence::CreateForGpuFence()` and
293`gl::GLFence::CreateFromGpuFence()` factory methods to create a
294platform-specific local fence object instead of using an implementation class
295directly.
296
297For Android and ChromeOS, the
Xu Xingc5b1b9582018-01-15 04:52:05298[gl::GLFenceAndroidNativeFenceSync](/ui/gl/gl_fence_android_native_fence_sync.h)
Klaus Weidnere66cc7d2017-12-09 17:26:30299implementation wraps the
300[EGL_ANDROID_native_fence_sync](https://www.khronos.org/registry/EGL/extensions/ANDROID/EGL_ANDROID_native_fence_sync.txt)
301extension that allows creating a special EGLFence object from which a file
302descriptor can be extracted, and then creating a duplicate fence object from
303that file descriptor that is synchronized with the original fence.
304
305### GpuFence and GpuFenceHandle
306
Xu Xingc5b1b9582018-01-15 04:52:05307A [gfx::GpuFence](/ui/gfx/gpu_fence.h) object owns a GPU fence handle
Klaus Weidnere66cc7d2017-12-09 17:26:30308representing a native GL fence. The `AsClientGpuFence` method casts it to a
309ClientGpuFence type for use with the [CHROMIUM_gpu_fence
Xu Xingc5b1b9582018-01-15 04:52:05310extension](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_gpu_fence.txt)'s
Klaus Weidnere66cc7d2017-12-09 17:26:30311`CreateClientGpuFenceCHROMIUM` call.
312
Xu Xingc5b1b9582018-01-15 04:52:05313A [gfx::GpuFenceHandle](/ui/gfx/gpu_fence_handle.h) is an IPC-transportable
Klaus Weidnere66cc7d2017-12-09 17:26:30314wrapper for a file descriptor or other underlying primitive object, and is used
315to duplicate a native GL fence into another process. It has value semantics and
316can be copied multiple times, and then consumed exactly one time. Consumers take
317ownership of the underlying resource. Current GpuFenceHandle consumers are:
318
319* The `gfx::GpuFence(gpu_fence_handle)` constructor takes ownership of the
320 handle's resources without constructing a local fence.
321
322* The IPC subsystem closes resources after sending. The typical idiom is to call
323 `gfx::CloneHandleForIPC(handle)` on a GpuFenceHandle retrieved from a
324 scope-lifetime object to create a copied handle that will be owned by the IPC
325 subsystem.
326
327### Sample Code
328
329A usage example for two-process synchronization is to sequence access to a
330globally shared drawable such as an AHardwareBuffer on Android, where the
331writer uses a local GL context and the reader is a command buffer context in
332the GPU process. The writer process draws into an AHardwareBuffer-backed
333GLImage in the local GL context, then creates a gpu fence to mark the end of
334drawing operations:
335
336```c++
337 // This example assumes that GpuFence is supported. If not, the application
338 // should fall back to a different transport or synchronization method.
339 DCHECK(gl::GLFence::IsGpuFenceSupported())
340
341 // ... write to the shared drawable in local context, then create
342 // a local fence.
343 std::unique_ptr<gl::GLFence> local_fence = gl::GLFence::CreateForGpuFence();
344
345 // Convert to a GpuFence.
346 std::unique_ptr<gfx::GpuFence> gpu_fence = local_fence->GetGpuFence();
347 // It's ok for local_fence to be destroyed now, the GpuFence remains valid.
348
349 // Create a matching gpu fence on the command buffer context, issue
350 // server wait, and destroy it.
351 GLuint id = gl->CreateClientGpuFenceCHROMIUM(gpu_fence.AsClientGpuFence());
352 // It's ok for gpu_fence to be destroyed now.
353 gl->WaitGpuFenceCHROMIUM(id);
354 gl->DestroyGpuFenceCHROMIUM(id);
355
356 // ... read from the shared drawable via command buffer. These reads
357 // will happen after the local_fence has signalled. The local
358 // fence and gpu_fence dn't need to remain alive for this.
359```
360
361If a process wants to consume a drawable that was produced through a command
362buffer context in the GPU process, the sequence is as follows:
363
364```c++
365 // Set up callback that's waiting for the drawable to be ready.
366 void callback(std::unique_ptr<gfx::GpuFence> gpu_fence) {
367 // Create a local context GL fence from the GpuFence.
368 std::unique_ptr<gl::GLFence> local_fence =
369 gl::GLFence::CreateFromGpuFence(*gpu_fence);
370 local_fence->ServerWait();
371 // ... read from the shared drawable in the local context.
372 }
373
374 // ... write to the shared drawable via command buffer, then
375 // create a gpu fence:
376 GLuint id = gl->CreateGpuFenceCHROMIUM();
377 context_support->GetGpuFenceHandle(id, base::BindOnce(callback));
378 gl->DestroyGpuFenceCHROMIUM(id);
379```
380
381It is legal to create the GpuFence on a separate command buffer context instead
382of on the command buffer channel that did the drawing operations, but in that
383case gl->WaitSyncTokenCHROMIUM() or equivalent must be used to sequence the
384operations between the distinct command buffer contexts as usual.