Frame Analysis - Minecraft RTX Beta

Minecraft is a creative survival game where you must mine for resources, build tools, farms, explore the world in search for valuable treasures, and build portals to neighboring worlds - the Nether and End.

It's a social game as well, where friends from different platforms can join in to work together to build interesting structures, and an intuitive interface to program your own servers and mods.

Recently, Minecraft for Windows 10 had an update to its renderer that introduced dramatic improvements to its design, adding real time ray tracing, ray tracing denoising, and deep learning super sampling.

So let's review how Minecraft RTX renders a single frame, referencing research papers where appropriate.

⚠️ Note: This is not an official analysis of the Minecraft RTX renderer, this is a beta version of the renderer so the design presented here may change in the final release. Please support Minecraft RTX by grabbing a copy of it on the Windows Store, thanks!

Ray Tracer

PBR Material Look Up Texture

Prior to rendering some preprocessing is done to cache Physically Based Rendering (PBR) material textures to a 2048x2048 lookup texture (LUT).

Horse BVH

Each frame of Minecraft RTX begins by building bottom level acceleration data structures BuildRayTracingAccelerationStructure(...) for any animated objects in the scene such as animals or players.

Scene BVH

Our top level acceleration structure is built, and we proceed to dispatchRays for our primary rays and light shadows, writing to a variety of different outputs such as:

Normals
Albedo/Metalness
Emissive/Roughness
Opacity/Material ID
Velocity
Cached Irradiance (per triangle)
Reprojected Primary Path Length
Low Precision Position
Geometric Normals
Volumetric light Ray Shadows
View Position
View Direction
Throughput

All passes are rendered twice next to one another on the x axis at 742x835 for a total of 1484x835 instead of 2560x1440. This odd number appears to just be a quarter on X and half on Y of the output resolution plus some padding. On the left is primary ray hit data, on the right is primary ray hit data through transmissive surfaces such as water.

For sampling, Minecraft uses a blue noise array of 128 256x256 RGBA8 images.

More Rays are cast for non-primary rays, writing to:

Diffuse 1spp

Indirect Diffuse
indirect Diffuse Chroma

Then another ray dispatch is done, writing to:

Specular 1spp

Indirect Specular
Reflection Distance

Volumetric Light 1spp

Volumetric fog is computed last.

SVGF Denoising

Minecraft uses Spatio-Temporal Variance Guided Filtering [Schied 2017], which consists of a variance driven spatio-temporal reprojection step which takes our moment buffers, history length, and previous frames and tries putting that data where it would be in the current frame.

For more information on real time tracing denoising with A-SVGF, check out my blog post on the subject.

This is followed by a bilateral filtering step, where over the course of serveral passes, they adaptively blur these frames. They also use a form of irradiance caching to help with resolving reprojected regions with little history information quicker.

Bilateral Specular 3rd

Diffuse Global Illumination w/ a 7x bilateral filter
Reflections w/ a 3x bilateral filter
Crepuscular Volumetric Rays with a 5x Guided bilateral filter

Finally, interleaved buffers are combined to produce the final output that will be fed to the DLSS 2.0 kernel.

Deep Learning Super Sampling

Deep Learning Super Sampling 2.0 (DLSS 2.0) uses an autoencoder that takes as input a jittered render target at a fraction of our output resolution, a jittered velocity buffer similar to Temporal Anti-Aliasing, and outputs a upscaled version of the final output.

UI Pass

UI Render Target

UIs are rendered out at full resolution to a separate render target and composited at the end of the frame.

Postprocessing

Minecraft finishes the frame with tone mapping and any enabled postprocessing effects such as vignette.

Conclusion

Final Output

Minecraft is a game that really benefits from a ray tracing system, with its dynamic environments and extreme variance in shadows and lighting, traversing cave systems and exploring structures feels much more immersive and interesting. If this was insightful or if you have any suggestions for another game to analyse let me know in the comments, and I'll see you next time.

More Resources

NVIDIA and Microsoft released the video Introduction to Real Time Ray Tracing with Minecraft.
Digital Foundry's Minecraft RTX developer interview provides a high level overview of the RenderDragon DX12 RTX renderer as well as the challenges the developer team had to face.
Peter Kristof of Microsoft made a really robust RTX Ambient Occlusion example an implementation of SVGF here.
Microsoft's DirectML Super Resolution Example, while not NVIDIA Deep Learning Super Sampling 2.0 (DLSS 2.0), is similar in that both perform upscaling.
NVIDIA has released an SDK to itegrate Deep Learning Super Sampling here. AMD's Fidelity FX Super resolution is also available for developers to integrate into their applications.
GTC 2020 - Creating Physically Based Materials for Minecraft with RTX reviews how Minecraft RTX designed their Physically Based Materials using Adobe Substance Designer.

[Schied 2017]

Spatiotemporal Variance-Guided Filtering

Christoph Schied (@c_schied)
ACM 2017
research.nvidia.com

Frame Analysis - Minecraft RTX Beta

Ray Tracer

SVGF Denoising

Deep Learning Super Sampling

UI Pass

Postprocessing

Conclusion

More Resources

GitHub Comments

Ray Tracer

SVGF Denoising

Deep Learning Super Sampling

UI Pass

Postprocessing

Conclusion

More Resources