So it turns out I was doing something dumb in that Vulkan bechmark post I shared yesterday (not considering that I may have been hitting a perf bottleneck that was skewing results...)

I've since updated the tests, re ran everything, and posted an updated article on my site:
kylehalladay.com/blog/tutorial

Hoooorraaaayy for extremely public mistakes!

@khalladay Nice! It's interesting that push constants are the slowest for the bistro scene. Maybe it scales badly with the number of draw calls/push constant calls?

I see you're resetting the existing command buffer, so it should keep enough allocated space to hold all the push constant data.

What's the reason for choosing HOST_CACHED memory btw? Wouldn't HOST_VISIBLE be better for CPU->GPU data? That's what I am using at least, but I could be doing things wrong.

@Gohla I went with HOST_CACHED because then I could update the transform data directly on the mapped pointer, and then flush all the changes at once. I only needed write access to that pointer, so I *think* that HOST_VISIBLE is unnecessary (is that correct? that feels like an unfounded assumption I made)

I actually found that some cases performed better with HOST_VISIBLE | HOST_CACHED, while other cases performed better with HOST_CACHED. I'm not sure of the reasons for that yet.

@Gohla It seems like perhaps that perf difference might be specific to NVidia cards (which have a lot of extra memory types hidden behind the scenes), since I think that according to the spec, the only HOST_CACHED or HOST_COHERENT memory types you can find are also HOST_VISIBLE

(as per: khronos.org/registry/vulkan/sp)

@Gohla Or it could also just be another mistake I've made somewhere in the program that's making it look like a perf difference there XD

@khalladay You're right! On my Nvidia card all host visible memory is either coherent, or coherent and cached (vulkan.gpuinfo.org/displayrepo), so I use one of those types.

I think you only need to request HOST_VISIBLE to be able to map/flush on the host, but HOST_CACHED also caches the memory on the host (which may improve performance?).

I was under the impression that you'd only need caching when reading GPU->CPU data, but have no data to support that claim :)

@Gohla Interesting, so this would suggest that instead of using HOST_CACHED | HOST_VISIBLE for the matrix data, the better choice would have been HOST_VISIBLE | DEVICE_LOCAL

@khalladay Yes but only on AMD, Nvidia cards do not have that memory type :(

Sign in to participate in the conversation
Gamedev Mastodon

Game development! Discussions about game development and related fields, and/or by game developers and related professions.