mastodon.gamedev.place is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon server focused on game development and related topics.

Server stats:

5.6K
active users

My team started looking at subgroup support for again. We pushed it out of the initial feature set due to our staffing load and suspected non portability. Revisiting it now.

Sadly, and frustratingly, implementations don't do what programmers think should happen. We are seeing very very nonportable behavior.

Still collecting data across devices and platforms that we will share soon enough.

Jeremy Ong

@dneto Out of curiosity, can you describe the discrepancy you've observed in expected programmer behavior vs driver behavior?

@ninepoints
Yes, the problem is programmers expect threads/invocations to reconverge at the end of a control flow divergence, as they look at it in the original source code.
But after the intermediate transformations, through several compilers from your original HLSL or whatever down to the machine, that assumption rarely holds.
Simple cases work by accident, but complex cases fail.

@ninepoints
The systemic problem is: most compiler stacks are CPU-based (almost always LLVM-derived) and don't understand the need to preserve structured control flow. Simple transforms that are fine in CPU mess up GPU multithreaded semantics.

LLVM tried to address this with the "convergent" attribute but it's not quite right.
See @nh 's LLVM dev meeting talk "Evolving 'convergent': Lessons from Control Flow in AMDGPU" slides and video at llvm.org/devmtg/2020-09/progra

llvm.orgProgram

@ninepoints @nh

Further, D3D and Metal provide handwavy promises at best, but we're getting better data you can't rely on them.

Vulkan at least gives crisp rules about convergence/reconvergence that are very loose by default and programmers are surprised by how loose they are.
A recent improvement in Vulkan land is VK_KHR_shader_subgroup_uniform_control_flow which you have to opt into to get a bit tighter behaviour, but still less than programmers assume.

@ninepoints @nh

We'll write up our findings in a presentable / digestible way to present to the W3C WebGPU/WGSL group.

@dneto @nh Thanks David, appreciate the context