So it turns out I was doing something dumb in that Vulkan bechmark post I shared yesterday (not considering that I may have been hitting a perf bottleneck that was skewing results...)
I've since updated the tests, re ran everything, and posted an updated article on my site:
Hoooorraaaayy for extremely public mistakes!
@khalladay Nice! It's interesting that push constants are the slowest for the bistro scene. Maybe it scales badly with the number of draw calls/push constant calls?
I see you're resetting the existing command buffer, so it should keep enough allocated space to hold all the push constant data.
What's the reason for choosing HOST_CACHED memory btw? Wouldn't HOST_VISIBLE be better for CPU->GPU data? That's what I am using at least, but I could be doing things wrong.
@Gohla I went with HOST_CACHED because then I could update the transform data directly on the mapped pointer, and then flush all the changes at once. I only needed write access to that pointer, so I *think* that HOST_VISIBLE is unnecessary (is that correct? that feels like an unfounded assumption I made)
I actually found that some cases performed better with HOST_VISIBLE | HOST_CACHED, while other cases performed better with HOST_CACHED. I'm not sure of the reasons for that yet.
@Gohla It seems like perhaps that perf difference might be specific to NVidia cards (which have a lot of extra memory types hidden behind the scenes), since I think that according to the spec, the only HOST_CACHED or HOST_COHERENT memory types you can find are also HOST_VISIBLE
@Gohla Or it could also just be another mistake I've made somewhere in the program that's making it look like a perf difference there XD
@khalladay You're right! On my Nvidia card all host visible memory is either coherent, or coherent and cached (http://vulkan.gpuinfo.org/displayreport.php?id=2776#memory), so I use one of those types.
I think you only need to request HOST_VISIBLE to be able to map/flush on the host, but HOST_CACHED also caches the memory on the host (which may improve performance?).
I was under the impression that you'd only need caching when reading GPU->CPU data, but have no data to support that claim :)
@khalladay Coincidentally, some great GDC slides on this topic were just released: http://www.gdcvault.com/play/1025458/Advanced-Graphics-Techniques-Tutorial-New
@Gohla Interesting, so this would suggest that instead of using HOST_CACHED | HOST_VISIBLE for the matrix data, the better choice would have been HOST_VISIBLE | DEVICE_LOCAL
Game development! Discussions about game development and related fields, and/or by game developers and related professions.