mastodon.gamedev.place is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon server focused on game development and related topics.

Server stats:

5.2K
active users

#avx512

1 post1 participant0 posts today

#dotnet #AVX512 code gen is unfortunately not great in the face of masks which Sep uses heavily. This means AVX-512 is slower than AVX2 🤔

cc @tannergooding can this be improved?

PS: While I understand the arguments for not having explicit mask types in dotnet I still think it will never be great, since it will be an endless whack-a-mole around code gen... compared to letting devs be able to do what they want.

#AMD #Ryzen9000 vs. #Intel #CoreUltra #ArrowLake On #Linux For Q1-2025 In ~400 Benchmarks
In cases where #AVX512 can be utilized, the Ryzen 9000 series is the definitive winner over the Intel Core Ultra Series 2 desktop processors. In some HPC applications the Core Ultra 9 285K with 24 physical cores does well in scenarios where SMP isn't leveraged.
Overall the #Zen5 based #Ryzen9 #9950X straight-up won 50% of the time with a first place finish.
phoronix.com/review/ryzen9000-

www.phoronix.comAMD Ryzen 9000 vs. Intel Core Ultra Arrow Lake On Linux For Q1-2025 In ~400 Benchmarks

How many xmm/ymm/zmm registers did x86 have vs. x86-64? I'm seeing conflicting information on Google, claiming that x86 had zmm0-zmm31 while others claim it only had up to xmm15/ymm15/zmm15. (This might be because some webpages are confusing x86-64 in 32bit mode as being the same as the x86 architecture proper.)

#asm#assembly#x86

Scaling an RGB image: godbolt.org/z/vMojsrhcG

GCC can only vectorize it on RVV and generates nice code with three indexed loads and a three segment segmented store. It fails for AVX512 /NEON.

clang manages something with AVX512, but you can barely call it vectorization.
The RVV codegen looks better, but it uses fixed length vectorization and seems to have miscalculated the best LMUL choice, which causes it to spill. You get better codegen if you set -mllvm --riscv-v-fixed-length-vector-lmul-max=4.

godbolt.orgCompiler Explorer - Cvoid scaleImg(size_t nh, size_t nw, size_t ow, size_t oh, float f, unsigned char *restrict o, unsigned char *restrict n) { for (int y = 0; y < nh; y++) { for (int x = 0; x < nw; x++) { int ox = (int)(x * f); int oy = (int)(y * f); n[(y * nw + x) * 3 + 0] = o[(oy * ow + ox) * 3 + 0]; n[(y * nw + x) * 3 + 1] = o[(oy * ow + ox) * 3 + 1]; n[(y * nw + x) * 3 + 2] = o[(oy * ow + ox) * 3 + 2]; } } }
#RVV#AVX512#NEON

#FFmpeg devs boast of up to 94x performance boost after implementing handwritten #AVX512 assembly code
The developers have created an optimized code path using the AVX-512 instruction set to accelerate specific functions within the FFmpeg multimedia processing library. By leveraging AVX-512, they were able to achieve significant performance improvements -- from three to 94 times faster -- compared to standard implementations.
tomshardware.com/pc-components

Tom's Hardware · FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly codeBy Anton Shilov

#Intel #CoreUltra 9 285K "#ArrowLake" Delivers Strong #Linux Performance Review
Power efficiency improvements with Arrow Lake are real. Core Ultra 9 285K on average was at 136W, right inline with 137W Ryzen 9 9950X and much lower than 156W average with the Core i9 14900K. Core Ultra 9 285K was very competitive but if running a lot of #AVX512 workloads and areas where Zen 5 was delivering striking wins, Ryzen 9 9950X and the ~$429 Ryzen 9 9900X can deliver great value.
phoronix.com/review/intel-core

www.phoronix.comIntel Core Ultra 9 285K "Arrow Lake" Delivers Strong Linux Performance Review

An interview with #AMD's #MikeClark, Father of Zen — 'Zen Daddy' says 3nm #Zen5 is coming fast; also talks compact cores for desktop
AMD expands its Zen 5 architecture. Unlike Intel, which has to reduce clock speeds when its processors run #AVX512 workloads, AMD says these powerful instructions will run at the same clock speeds as standard integer operations. Clark also expanded on how the company achieved that feat and said that its #Zen5c cores can also run full AVX512.
tomshardware.com/pc-components

AMD Zen 5 Execution Engine Leaked, Features True 512-bit FPU

Giving "Zen 5" a 512-bit FPU meant that AMD also had to scale up the ancillaries [..]. The L1 Data cache has been doubled in bandwidth, and increased in size by 50%. The L1D is now 48 KB in size [..]. FPU MADD latency has been reduced by 1 cycle. Besides the FPU, AMD also increased the number of Integer execution pipes to 10, from 8 on "Zen 4."

techpowerup.com/321201/amd-zen

TechPowerUpAMD Zen 5 Execution Engine Leaked, Features True 512-bit FPUAMD "Zen 5" CPU microarchitecture will introduce a significant performance increase for AVX-512 workloads, with some sources reported as high as 40% performance increases over "Zen 4" in benchmarks that use AVX-512. A Moore's Law is Dead report detailing the execution engine of "Zen 5" holds the ans...
#AMD#Zen5#CPU

#Benchmarking The Experimental #Ubuntu #x86_64_v3 Build For Greater Performance On Modern #CPU
With x86-64-v3 basically being #Intel #Haswell and #AMD Excavator or newer (with some exceptions like select Atoms), it would be really interesting too if #Canonical would consider an x86-64-v4 option for modern systems with #AVX512 support. It'd be really interesting to see at least an experimental Ubuntu x86-64-v4 build to see what that could mean for #servers and #HPC
phoronix.com/review/ubuntu-x86