Sunday, January 13, 2013

Consumer/Producer approach for synchronizing buffer access using EGL fences

Motivation

At work I had the chance to do some application development related to displaying video content onto a 3D surface in Android. For what is worth, the use case was a little bit more involve than simply texturing one of the faces of a cube with Video. Texture streaming is nothing new as it has been done before usually through proprietary vendor extensions. More recently, the Android team has exposed the feature to application developers starting in Android 4.0 ICS release.

However, soon enough I saw myself reading the native code in C++ when a screen tearing problem showed up in some of the frames being rendered. The main issue came about as a result of access to the native buffer needing to be properly synchronized between the Video decoder and the 3D client app. Realize that both of these components are running in their own process and asynchronously.

So, in this post I discuss some of the things that I learned about this particular use case, specifically some of the EGL extensions that are required to efficiently stream video frames onto a 3D surface. More importantly, I go over a fairly new approach for synchronization (at the GPU driver level) using EGL sync objects that helped with the tearing problem discussed above (code snippets are provided at the bottom).

Why screen tearing

Screen tearing is a common problem in Graphics, and on my case it was due to the content of the buffer getting over-written by the Video decoder (at the wrong time) while the content was still being read. This problem is usually solved by using some type of synchronization to the buffer where access for write permission is granted after making sure that the content of the buffer has already been consumed (i.e., display to the main screen). 

Going back to the main use case, I knew that the buffers were getting overwritten, the only question remaining was, what type of synchronization should be used in this case? 

EGLImage extensions

But before answering the above question, let's talk about EGLImages as they represent an important building block when displaying Video content as OpenGL ES textures. The reason the Khronos group came up with the idea of EGLImage was to be able to share buffers across rendering APIs (OpenVG, OpenGL ES, and OpenMAX) without the need of extra copies. For example, consider a UI for which both 3D, and Video content can be written into from the same rendering context (think YouTube widget). Without this common data type the application would have to rely on a copy to move the data around (usually done through the glTexImage2D() call). If we consider video frames where the app needs to show them quickly, a lot of important resources can be wasted thus hindering performance (see Figure 1).


Figure 1. Data copies involve wasting important resources such as CPU cycles and memory bandwidth. Figure taken from [2].

With a common shared buffer across APIs the application is now able to reuse the EGLImage as both the destination of the decode and as a source for an OpenGL ES texture without copying any data (see Figure 2).


Figure 2. An EGLImage surface used as a double-purpose buffer. Figure taken from [2].

For the specification of all these extensions follow these links:

In summary, EGLImages provide a common surface that can be shared between rendering APIs. The feature proved so powerful in terms of performance and flexibility (more recently allowing YUV content besides RGB) that it became an important building block for many rendering engines (Android, Webkit to name a few). Follow this link to read more about the concept of DirectTextures using EGLImages in Android, and to understand why they perform so well in the rendering pipeline in Android.

EGL sync objects

A disadvantage for developers about the use of EGLImages is the issue about synchronization at the application level since any updates done to the buffer will also get reflected immediately at the other side in the OpenGL ES texture. Since applications are usually running at 60 fps, we must guarantee that the texture remain without glitches or artifacts for as long as 16 milliseconds, which is likely the interval it takes the display to refresh with new content.

Potentially, access to the buffers could be handle at the application level but that's far from ideal since it would seem too much of a burden for developers to have to deal with that much responsibility. Thankfully, the Khronos group has also made available another extension that takes care of inter-API fencing and signaling at the driver level. The main idea behind the fence is to be able to insert a 'fence' command into the client API, in this case, right before eglSwapBuffers() is called, and then have this sync object tested for completion when the entire frame have completed. Since the fence command was inserted as the second to last command in the list, the event for completion wouldn't be signaled until all the previous commands leading to the fence command has completed.

To put things in perspective using the Video as a texture use case, it's now fairly easy to imagine a separate thread for the Video decode continuously polling for when to start decoding new frames (thus in fact synchronizing the access to the buffer). The 3D client app, in the other hand, it's also busy constructing the scene by texturing the geometry when new Video content is available and finally putting a fence right before eglSwapBuffers is called. Because the 3D client app is in charge of putting the fence in the first place it can take as much time as it needs to display the frame without any glitches of tearing. After all, the fence object must guarantee the content of the buffer will remain intact until a signal is send across for when to update the buffer.

Figure 3. Consumer/Producer approach to synchronizing buffer access using EGL fence sync objects. Figure taken from [1].

For maximum performance by employing a queue of EGLImages it's possible to have both the Video decoder and the 3D client app working in parallel without blocking each other.

For the specification of the EGL fence sync extension follow this link:

Code snippets:

void *media_server(void* _d) {
   //Process other tasks until signaled
   if( cpu_access ) {
       updatePixels(void);
       cpu_access = false;
    }
}

void *sync_listener_callback(void){
EGLint value =0;
//blocks the calling thread until the specified sync object <sync> is
//signaled, or until <timeout> nanoseconds have passed.
EGLint result = eglClientWaitSyncKHR(dpy,
                        fence,
                        EGL_SYNC_FLUSH_COMMANDS_BIT_KHR,
                        EGL_FOREVER_KHR);
 if (result == EGL_FALSE) {
      printf("EGL FENCE: error waiting for fence: %#x\n",eglGetError());
      return;
    }
 result = eglGetSyncAttribKHR(dpy,fence,EGL_SYNC_STATUS_KHR,&value);
 if(value == EGL_SIGNALED_KHR) {
      cpu_access = true;
   }
  eglDestroySyncKHR(dpy,fence);
}

void *compositor(void){
   glVarious();

//By inserting a sync object just before eglSwapBuffers is called, it is //possible to wait on that fence allowing a calling thread to determine //when the GPU has finished writing to an EGLImage render target.
EGLSyncKHR fence = eglCreateSyncKHR(dpy, EGL_SYNC_FENCE_KHR, NULL);

   if (fence == EGL_NO_SYNC_KHR) {
      printf("EGL FENCE: error creating fence: %#x\n", eglGetError());
    }
   eglSwapBuffers(eglDisplay, eglSurface);
}

References:

[1] Imagination Technologies Ltd, "EGLImage, NPOT & Stencils PowerVR Performance Recommendations". Available Online
[2] Neil Trevett, Khronos Mobile Graphics and Media Ecosystem. Available Online.
[3] The Android Open Source Project. Unit tests for Graphics. Available Online.