Aras Pranckevičius @aras

Recent searches

Search options

Only available when logged in.

**David Neto** @dneto · Sep 8, 2023

Sep 8, 2023

My team started looking at subgroup support for #WebGPU again. We pushed it out of the initial #WGSL feature set due to our staffing load and suspected non portability. Revisiting it now.

Sadly, and frustratingly, implementations don't do what programmers think should happen. We are seeing very very nonportable behavior.

Still collecting data across devices and platforms that we will share soon enough.

Jeremy Ong @ninepoints@mastodon.gamedev.place

@dneto Out of curiosity, can you describe the discrepancy you've observed in expected programmer behavior vs driver behavior?

Sep 08, 2023, 02:27 AM·

0boosts·1favorite

**David Neto** @dneto · Sep 8, 2023

Sep 8, 2023

David Neto @dneto

@ninepoints
Yes, the problem is programmers expect threads/invocations to reconverge at the end of a control flow divergence, as they look at it in the original source code.
But after the intermediate transformations, through several compilers from your original HLSL or whatever down to the machine, that assumption rarely holds.
Simple cases work by accident, but complex cases fail.

**David Neto** @dneto · Sep 8, 2023

Sep 8, 2023

David Neto @dneto

@ninepoints
The systemic problem is: most compiler stacks are CPU-based (almost always LLVM-derived) and don't understand the need to preserve structured control flow. Simple transforms that are fine in CPU mess up GPU multithreaded semantics.

LLVM tried to address this with the "convergent" attribute but it's not quite right.
See @nh 's LLVM dev meeting talk "Evolving 'convergent': Lessons from Control Flow in AMDGPU" slides and video at https://llvm.org/devmtg/2020-09/program/

llvm.orgProgram

**David Neto** @dneto · Sep 8, 2023

Sep 8, 2023

David Neto @dneto

@ninepoints @nh

Further, D3D and Metal provide handwavy promises at best, but we're getting better data you can't rely on them.

Vulkan at least gives crisp rules about convergence/reconvergence that are very loose by default and programmers are surprised by how loose they are.
A recent improvement in Vulkan land is VK_KHR_shader_subgroup_uniform_control_flow which you have to opt into to get a bit tighter behaviour, but still less than programmers assume.