To display the view for each eye, the simplest method is to run the render loop twice. Each eye will configure and run its own iteration of the render loop. At the end we will have two images that we can send to the display device. The underlying implementation uses two Unity cameras, one for each eye, and they go through the process of generating the stereo images. This was the first method of XR support in Unity and is still offered by third-party headset plug-ins.
Although this method certainly works, the multi-camera relies on brute force and is the least efficient in terms of CPU and GPU. The CPU must completely iterate twice through the render loop and the GPU is probably not able to use the caching of objects that have been pulled over its eyes twice.
Multi-Pass was Unity’s first attempt to optimize the XR render loop. The core idea was to extract parts of the render loop that were view-independent. This means that any work that does not explicitly rely on XR views does not have to be done per eye.
The most obvious candidate for this optimization is shadow rendering. Shadows are not explicitly dependent on the position of the camera viewer. Unity actually implements shadows in two steps: Create cascaded shadow maps and then assign the shadows to the screen space. For Multi-Pass, we can create a set of cascaded shadow maps and then create two shadow maps for the screen, since the shadow maps for the screen space depend on the viewer’s location. Due to the architecture of our shadow generation, you benefit from the location of the shadow maps of the screen space, since the shadow map generation loop is relatively tightly coupled. This can be compared to the remaining render workload, which requires a full iteration over the render loop before returning to a similar stage (e.g., the eye specific opaque passes are separated by the remaining render loop stages).
The other step that can be divided between the two eyes may not be obvious at first: we can do a single cull between the two eyes. In our first implementation, we used frustum culling to generate two lists of objects, one per eye. However, we could create a uniform culling frustrum that is shared between our two eyes. This means that each eye will do a little more than with a single eye, but we have considered the benefits of a single sorting to outweigh the cost of some additional vertex shaders, clipping, and screening.
Multi-Pass gave us some nice savings over the multi-camera, but there is more to do.
Single pass stereo rendering means that we do a single crossing of the entire render loop instead of certain sections twice.
To do both draws, we need to make sure we have all the constant data and an index bound.
What about the draws themselves? How can we do any draw? In Multi-Pass, each eye has its own render target, but we can’t do that for Single-Pass because the cost of switching render targets for successive draw calls would be prohibitive. A similar option would be to use render target arrays, but we’d have to export the slice index from the geometry shader on most platforms, which can be expensive on the GPU and invasive for existing shaders.
The solution we agreed on was to use a double-wide render target and switch the viewport between draw calls so that each eye could render to half the double-wide render target. Changing viewports is costly, but less invasive than changing render targets and less invasive than using the geometry shader (although Double-Wide poses its own challenges, especially in post processing). There is also the associated option of using viewport arrays, but they have the same problem as rendering target arrays, since the index can only be exported from a geometry shader. There is another technique that uses dynamic clipping, but we won’t explore it here.
Now that we have a solution to start two consecutive draws to see both eyes, we need to configure our supporting infrastructure. In the multi-pass, because it was similar to monoscopic rendering, we could use our existing view and projection matrix infrastructure. All we had to do was replace the view and projection matrix with the current eye matrices. With Single Pass, however, we don’t want to unnecessarily switch between constant buffer bindings. Instead, we tie the view and projection matrices of both eyes together and index them with unity_StereoEyeIndex, which we can invert between trains. This allows our shader infrastructure to choose within the shader pass which set of view and projection matrices they want to render with.
An additional detail: To minimize the Viewport and unity_StereoEyeIndex state changes, we can change our Eye Draw pattern. Instead of drawing to the left, right, left, left, right, right and so on, we can draw the left, right, right, right, right, left, left, etc. instead. cadence. This allows us to halve the number of state updates compared to alternating cadence.
This is not exactly twice as fast as Multi Pass. This is because we’ve already been optimized for culling and shadows, as well as the fact that we’re still sending a draw by eye and switching viewports, which causes some CPU and GPU costs.
Stereo Instancing (Single Pass Instanced).
Previously, we mentioned the possibility of using a render target array. Render Target Arrays are a natural solution for stereo rendering. The eye textures share format and size and qualify them for use in a rendering target array. But using the geometry shader to export the array slice is a big drawback. What we really want is the ability to export the Render Target Array Index from the Vertex shader for easier integration and better performance.
The ability to export the Render Target Array Index from the Vertex Shader actually exists with some GPUs and APIs and is becoming more common. On DX11, this functionality is provided as a feature option VPAndRTArrayIndexFromAnyShaderFeedingRasterizer.
Now that we can determine on which slice of our rendering target array we will render, how can we select the slice? We use the existing Single Pass Double-Wide infrastructure. We can use unity_StereoEyeIndex to fill the SV_RenderTargetArrayIndex semantics in the shader. On the API side, we no longer need to switch the viewport, because the same viewport can be used for both layers of the render target array. And we’ve already configured our matrices so that they can be indexed from the Vertex shader.
Although we could still use the existing technique of outputting two draws and switching the value unity_StereoEyeIndex in the constant buffer before each draw, there is a more efficient technique. We can use GPU instancing to send a single draw call and allow the GPU to multiply our draws through both eyes. We can double the existing instance count of a draw (if there is no instance usage, we simply set the instance count to 2). Then we can decode the instance ID in the vertex shader to determine which eye we are rendering to.
The biggest effect of using this technique is that we literally halve the number of draw calls we generate on the API side, saving part of the CPU time. In addition, the GPU itself is able to process the draws more efficiently, even though the same amount of work is generated, since it does not have to process two single draw calls by not having to change the viewport between the draws as we do with traditional single-pass techniques.
Please note: This is only available for users who are running their desktop VR experience under Windows 10 or HoloLens.
Single Pass Multi View.
Multi-View is an extension for certain OpenGL/OpenGL ES implementations where the driver itself multiplexes individual draw calls across both eyes. Instead of explicitly instantiating the draw call and decoding the instance into an eye index in the shader, the driver is responsible for duplicating the draws and creating the array index (via gl_ViewID) in the shader.
There is an underlying implementation detail that differs from the stereo instance: Instead of the vertex shader, which explicitly selects the render target array layer to be rasterized, the driver itself determines the render target. Gl_ViewID is used to calculate the view-dependent state, but not to select the render target. In use it doesn’t play a big role for the developer, but it is an interesting detail.