mastodon.gamedev.place is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon server focused on game development and related topics.

Server stats:

5.6K
active users

"Zen5's AVX512 Teardown + More..." by Alexander J. Yee

Fantastic writeup – exciting and surprising things happening in the AVX-512 world.

numberworld.org/blogs/2024_8_7

www.numberworld.orgZen5's AVX512 Teardown + More...

Personal highlights:
* 2048-bit per cycle SIMD throughput – incredible.
* 2 VPERMI2B per cycle – woah 🤯 A64 cores are a _long_ way off competing with that shuffle
* 3 PDEP/PEXT per cycle – I love to see it, it'll be fun to see new code using that. (Scalar and 3x current SVE throughput.)
* 3 CRC per cycle – neat, but awkward to use on big buffers (need to compute 9 independent CRCs and merge them?)
* 1 per cycle VP2INTERSECTD – hilarious
* 2 cycle SIMD latency – welcome to the party, pal!

Arseny Kapoulkine

@dougall Thanks for sharing! This will need to be shared again in a week I guess when embargoes lift :D

I'm a little sad that the AVX512 gains don't translate to 256 wide SIMD and that it's mostly just harmed by latency increases. Given the crazy situation with Intel & 512-wide SIMD I don't expect that to be an attractive target for many years, and it would be nice to maximize 256 wide perf.

@zeux Heh, true, looking forward to reading it! :)

Yeah, it's a big mess... I'm hoping people write the AVX-512 code to make it competitive, and that most 256-bit code isn't too latency sensitive, but there may be other balance issues that make it tricky (e.g. memory bandwidth). It could be especially awkward if it turns out Intel made the right move, and AMD also rolls it back further down the line. Cortex-X925 is going for 6x128-bit, and I think 6x256 from AMD would have been better received.