In light of AVX10 news I've ran the benchmarks again with recent code (some tweaks to idx) & AVX512 config.
I've written this (128bit) AVX512 code in 2019. I haven't gone ahead and enabled it by default because of muted gains and AVX512 deployment story on clients which continues to be Weird, but hey at least there's hope one day client Intel CPUs will get it :) the Intel chip here is a server Sapphire Rapids where AVX512 has been doing quite well for a while now.
@zeux Oh cool, they're back to all-512b. Wake me up when they actually ship something to end users.
@zeux At this point the actual client AVX512 landscape consists of a handful of Ice lake/Tiger lake laptops, even fewer Rocket Lake desktops, and just... AMD, which has been actually shipping this For Real since Zen 4. I know which one I'm going to be targeting in practice.
@zeux Anyway, extrapolating from current trends as indicated in the revision history, my guess is that the AVX10 whitepaper 4.0, to be released sometime end of this year or early next year, will also drop 512-bit support
@Doomed_Daniel @zeux No original proposal 1.0 was 128b+256b+512b impls allowed, 2.0 was only 256b+512b impls allowed, 3.0 is only 512b impls allowed, extrapolated 4.0 is "do not implement this"
@rygorous @Doomed_Daniel @zeux AVX10 revision 4.0 is going to resurrect 3DNow. You heard it here first!
@aras @rygorous @Doomed_Daniel @zeux actually, it's called AiNOW this time
@rygorous Yeah I had an Ice Lake laptop and was excited about this ISA being useful…
@rygorous Geez those gains seem extremely meh. :(
@malwareminigun The stuff Arseny posted is not 512-bit, that's all 128-bit, just with new AVX512 instructions. (All the new AVX512 instructions also have 256b and 128b forms.)
@malwareminigun i.e. those are the gains purely from the new instructions added being useful to the problem at hand without getting any wider. 10%-ish gains from added instructions without going wider is quite substantial.
@rygorous Ah, that's fair