mastodon.gamedev.place is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon server focused on game development and related topics.

Server stats:

5.1K
active users

Doug Binks

Just realised that in my software GPU wavefront pathtracing I could implement efficient virtual memory by storing any 'page miss' paths and then sort by page, load/unload pages and process the relevant paths.

Does anyone know if this has been done before?

@aras Thanks! That does seem very similar. In my case the pages are octree data rather than textures but the idea appears to be the same.

@dougbinks @aras GPU-driven page-based memory management is one of my interests as of late. I've mainly been looking at textures, because it is the most obvious, but other applications are also very interesting to me. Please let me know if you have any suggestions for APIs or HW :)

A bit of code we just released which shows an example: github.com/NVIDIA-RTX/RTXTS

@anji @aras When I get around to this I'll likely use software paging as this is fairly simple with the structure I'm using, but I might look at the Vulkan sparse buffers at some point.

@dougbinks Redshift’s (Maxon) renderer is out of core renderer. I believe they combine tiled rendering and something similar to what you described to keep memory usage under control.

@gavkar Tiled is certainly a good idea with out of core rendering.

@dougbinks @gavkar hmm is it, in a path tracer? The bounce rays are going to hit “pretty much anything”, both BVH wise and texture wise. I thought the screen space tiling that renderers do is more about a way to distribute work in a render farm.

@aras @gavkar Perhaps, I was thinking that the number of pages missed would be lower.

@dougbinks This is essentially how any GPU pathtracer that uses texture caching works, even if it’s not wavefront (for depth-first you have to skip a path that encounters a page miss and restart it later once the page is available). Making efficient use of limited bandwidth across PCIe and the need to not stall an entire warp if a single thread on the warp misses a cache hit basically forces this implementation by sheer necessity.