Topic:
Notes on implementing wavefront path tracing.
Entry level:
Knowledge of what is path tracing, preferably awareness of the wavefront theory.
Read Time:
15 minutes.
Project Link:
GitHub Link:
TBA
Before You Start:
Wavefront article by Jacco Bikker - wavefront theory.
Table of Contents:
Why?
Wavefront path tracing is one of the most modern ways of structuring your GPU path-tracing algorithm. Why so? In short, after a lot of performance research, it turned out that contrary to expectations ray tracing is compute bound more than bandwidth bound. This means that the top priority is to make sure that as many threads as possible perform the same action not to minimize the resource access in a warp (which is, of cource, also important). So, the goal of moving away from recursive/iterative path tracing is to reduce the divergence.
Goal
What Do We Start From?
I do not recommend starting straight away from the Wavefront approach, as it is rather confusing. Learning basics of the path tracing in the Wavefront setup would be uncomfortable.
So, create (or research in depth) a CPU recursive path tracer => a GPU iterative one => finish with the Wavefront.
Other topics to check out: NEE (next event estimation - separating direct and indirect lighting), RR (Russian roulette - killing rays, giving more importance to the surviving ones).
So, let's agree that we have a simple GPU path tracer made in a single shader with a while loop.
Implementation Overview
Let's take a very rough and simplified path-tracing algorithm and break it into parts, which will become our separate shaders.
You probably have something similar in your path tracer. So, let's go through the main parts in the order of execution. Each of these parts will be a separate shader.
Generate: Creates rays from the camera. Executes once a frame. Fills up a ray batch, which is passed to the next shader.
Extend: Traces a ray to find an intersection. Does this both for primary and secondary rays. Fills up the extend results batch.
Shade: The actual shading code: PDFs and BSDFs are calculated here. NEE and RR are also a part of the Shade. Fills up shadow rays batch and bounced rays batch.
Connect: Traces a shadow ray to find an intersection. Adds energy to the final accumulated buffer if the shadow ray is not occluded.
Finalize: Assemble the final output image (gamma correction, etc.).
As we can see, there are 2 ways to structure your wavefront shader dispatches. The first one starts with generate, then for each ray bounce we call extend to trace rays into the geometry, shade - to calculate all the shading values and connect to trace shadow rays. In the Finalize we apply gamma correction, frame accumulation, etc.
The second structure moves shadow rays tracing outside of the loop. For this to work, you will need a shadow rays buffer to have a size bigger than the number of primary rays (roughly 2 times depending on the application).
We will move on with the first implementation as it is a bit simpler and linear.
Resources:
Look at the picture on the left. The general list of the resources is listed for each shader. Of course, this can change based on your application or the way you prefer to structure the rendering code. For instance, a lot of this data can be moved to bindless heaps, resource resets can be done through copying/writing in the shader, etc.
Speaking of resetting values, for each wavefront loop (ray bounce) we need to reset the counters and switch the previous bounce ray data to the new one. We do this after the Connect.
Resource size: by default all the batches and the wavefront output are 1D buffers with the number elements = screen width * screen height.
Dispatch threads: by default with 1 spp, Generate and Finalize are 2D dispatches of the screen size, the rest is 1D dispatches with the screen width * screen height threads.
Closer look at resources:
Ray Batch and New Ray Batch use the same struct - Ray. Extend Batch uses Extend Result and Shadow Ray Batch uses Shadow Ray struct.
All of these structs can change a bit according to the application needs. Moreover, for performance reasons, values in these structs can be compacted: model and instance id, for instance, can be stored in 1 uint.
Wavefront Output is a 1D color buffer to which we add values in the shade and connect. It is not clamped to the range [0, 1], so if you were to implement basic frame accumulation, you can skip cleaning this buffer and before writing the result to the output texture, divide the wavefront output color by the number of accumulated frames.
You can also see that we use atomic counters for shadow rays and new rays. This is a key way to reduce divergence. As mentioned before, we always dispatch screen width * screen height threads, so to keep the threads aligned we have a check in a shader. This way the first N threads will do the same work and the rest will be finished upon a single if statement. This is more performant than launching N threads instead.
Model data can be anything, you need for shading. In our case, we store gltf model data in a resource heap and access it bindlessly (DX12 shader model 6.6 feature).
Notes on shaders:
Random:
For random numbers, we use white noise with XOR and Wang Hash for seeding. To get a more diverse random number we combine pixel id, frame id, and wavefront loop id into a single unit by multiplying each of these values by some large prime number and adding them.
Generate:
With random values offset to the pixel, you can add anti-aliasing while generating primary rays.
Extend:
We work only with active rays, so the main shader body starts with "if (thread idx < ray count){".
Shade:
Same as Extend, we work with active rays.
The "Last Specular" value is used to know if the ray is primary or a primary one that bounced from the mirror. This is very useful for the NEE, as instead of quitting the shader upon hitting a light source, we can add its color (without intensity) to the wavefront output to render our lights without adding energy to the system.
To write to a New Rays Batch or Shadow Rays Batch you can use InterlockedAdd() which returns an atomic value before incrementing it.
Connect:
You can pass a threshold value to get rid of the fireflies "if (lenght(energy rgb) > threshold) energy = normalize(energy) * threshold".
Small Important Details
Current State & Next Steps
Currently, my implementation of the Wavefront is not the most optimal (some of the values are not compacted, some square roots can be removed, and Connect can be outside of the loop). All of this is soon to be changed.
Another important step to take is to "go volumetric". Modern path tracers embrace volumetric rendering, which enhances the picture a lot. I have not seen articles about merging volumetric rendering with Wavefront yet. However, I assume, that with the new ray traverses, that we need to account for in volumetric rendering, some parts of the Wavefront algorithm will have to change.
Conclusion
If you do GPU path tracing for a large application or a study purpose, Wavefront is a great choice to make. However, to learn path tracing itself I will still recommend staying a bit away from the Wavefront, as it will complicate learning other parts of the algorithm.
Hope this article was helpful. The code will be open-sourced soon. If you have any questions, don't hesitate to ask them publicly here or privately via email (bazz0205@gmail.com) or my LinkedIn.
About this Portfolio
Welcome to my portfolio website! I'm Andrei Bazzaev, a game developer and graphics programmer. Check out my projects and other parts of my portfolio to see my skills and experience.
I'm currently seeking an internship, so feel free to contact me with any opportunities. Thank you for visiting!
Created with © systeme.io