Aras Pranckevičius @aras

1 post1 participant0 posts today

**nietras** @nietras@mastodon.social · 5d

#dotnet #AVX512 code gen is unfortunately not great in the face of masks which Sep uses heavily. This means AVX-512 is slower than AVX2

cc @tannergooding can this be improved?

PS: While I understand the arguments for not having explicit mask types in dotnet I still think it will never be great, since it will be an endless whack-a-mole around code gen... compared to letting devs be able to do what they want.

**Benjamin Carr, Ph.D.** @BenjaminHCCarr@hachyderm.io · Apr 4

Apr 4

Benjamin Carr, Ph.D. @BenjaminHCCarr@hachyderm.io

#AMD #Ryzen9000 vs. #Intel #CoreUltra #ArrowLake On #Linux For Q1-2025 In ~400 Benchmarks
In cases where #AVX512 can be utilized, the Ryzen 9000 series is the definitive winner over the Intel Core Ultra Series 2 desktop processors. In some HPC applications the Core Ultra 9 285K with 24 physical cores does well in scenarios where SMP isn't leveraged.
Overall the #Zen5 based #Ryzen9 #9950X straight-up won 50% of the time with a first place finish.
https://www.phoronix.com/review/ryzen9000-core-ultra-linux613

www.phoronix.comAMD Ryzen 9000 vs. Intel Core Ultra Arrow Lake On Linux For Q1-2025 In ~400 Benchmarks

**Benjamin Carr, Ph.D.** @BenjaminHCCarr@hachyderm.io · Mar 5

Mar 5

Benjamin Carr, Ph.D. @BenjaminHCCarr@hachyderm.io

The Compelling #AVX512 Performance Advantage On #AMD #EPYC 9005 "Turin"
Workloads tested on this #EPYC9655 Supermicro server, with AVX-512 yielded 1.57x the performance of the same hardware/software but with AVX-512 forced off.
https://www.phoronix.com/review/amd-epyc-turin-avx512

www.phoronix.comThe Compelling AVX-512 Performance Advantage On AMD EPYC 9005 "Turin"

**Hacker News** @h4ckernews@mastodon.social · Mar 1

Mar 1

Hacker News @h4ckernews@mastodon.social

Zen 5's AVX-512 Frequency Behavior — https://chipsandcheese.com/p/zen-5s-avx-512-frequency-behavior
#HackerNews #Zen5 #AVX512 #Frequency #Behavior #Chips #Architecture #Performance

Chips and Cheese · Mar 1Zen 5's AVX-512 Frequency BehaviorBy Chester Lam

**postmodern** @postmodern@infosec.exchange · Feb 27

Feb 27

postmodern @postmodern@infosec.exchange

How many xmm/ymm/zmm registers did x86 have vs. x86-64? I'm seeing conflicting information on Google, claiming that x86 had zmm0-zmm31 while others claim it only had up to xmm15/ymm15/zmm15. (This might be because some webpages are confusing x86-64 in 32bit mode as being the same as the x86 architecture proper.)

#asm #assembly #x86

**camelcdr** @camelcdr@tech.lgbt · Nov 18, 2024 *

Nov 18, 2024 *

camelcdr @camelcdr@tech.lgbt

Scaling an RGB image: https://godbolt.org/z/vMojsrhcG

GCC can only vectorize it on RVV and generates nice code with three indexed loads and a three segment segmented store. It fails for AVX512 /NEON.

clang manages something with AVX512, but you can barely call it vectorization.
The RVV codegen looks better, but it uses fixed length vectorization and seems to have miscalculated the best LMUL choice, which causes it to spill. You get better codegen if you set -mllvm --riscv-v-fixed-length-vector-lmul-max=4.

godbolt.orgCompiler Explorer - Cvoid scaleImg(size_t nh, size_t nw, size_t ow, size_t oh, float f, unsigned char *restrict o, unsigned char *restrict n) { for (int y = 0; y < nh; y++) { for (int x = 0; x < nw; x++) { int ox = (int)(x * f); int oy = (int)(y * f); n[(y * nw + x) * 3 + 0] = o[(oy * ow + ox) * 3 + 0]; n[(y * nw + x) * 3 + 1] = o[(oy * ow + ox) * 3 + 1]; n[(y * nw + x) * 3 + 2] = o[(oy * ow + ox) * 3 + 2]; } } }

#RVV #AVX512 #NEON

**OSTechNix** @ostechnix@floss.social · Nov 6, 2024

Nov 6, 2024

OSTechNix @ostechnix@floss.social

FFmpeg Sees 94x Performance Boost with Handwritten AVX-512 Code #ffmpeg #AVX512 #AssemblyCode #Opensource
https://ostechnix.com/ffmpeg-sees-94x-performance-boost-with-handwritten-avx-512-code/

**Benjamin Carr, Ph.D.** @BenjaminHCCarr@hachyderm.io · Nov 5, 2024

Nov 5, 2024

Benjamin Carr, Ph.D. @BenjaminHCCarr@hachyderm.io

#FFmpeg devs boast of up to 94x performance boost after implementing handwritten #AVX512 assembly code
The developers have created an optimized code path using the AVX-512 instruction set to accelerate specific functions within the FFmpeg multimedia processing library. By leveraging AVX-512, they were able to achieve significant performance improvements -- from three to 94 times faster -- compared to standard implementations.
https://www.tomshardware.com/pc-components/cpus/ffmpeg-devs-boast-of-up-to-94x-performance-boost-after-implementing-handwritten-avx-512-assembly-code

Tom's Hardware · Nov 4, 2024FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly codeBy Anton Shilov

**Maxi 10x** @frumble@chaos.social · Nov 5, 2024

Nov 5, 2024

Maxi 10x @frumble@chaos.social

Klingt zu gut um generalisierbar wahr zu sein, ich vermute dahinter letztlich einen von Intel finanzierten Image-Stunt.

AVX-512: #FFmpeg mit 94-facher Leistung

https://www.golem.de/news/avx-512-ffmpeg-mit-94-facher-leistung-2411-190481.html

https://www.tomshardware.com/pc-components/cpus/ffmpeg-devs-boast-of-up-to-94x-performance-boost-after-implementing-handwritten-avx-512-assembly-code #AVX512

Golem.de · Nov 5, 2024AVX-512: FFmpeg mit 94-facher Leistung - Golem.deBy Boris Mayer

**Benjamin Carr, Ph.D.** @BenjaminHCCarr@hachyderm.io · Oct 24, 2024

Oct 24, 2024

Benjamin Carr, Ph.D. @BenjaminHCCarr@hachyderm.io

#Intel #CoreUltra 9 285K "#ArrowLake" Delivers Strong #Linux Performance Review
Power efficiency improvements with Arrow Lake are real. Core Ultra 9 285K on average was at 136W, right inline with 137W Ryzen 9 9950X and much lower than 156W average with the Core i9 14900K. Core Ultra 9 285K was very competitive but if running a lot of #AVX512 workloads and areas where Zen 5 was delivering striking wins, Ryzen 9 9950X and the ~$429 Ryzen 9 9900X can deliver great value.
https://www.phoronix.com/review/intel-core-ultra-9-285k-linux

www.phoronix.comIntel Core Ultra 9 285K "Arrow Lake" Delivers Strong Linux Performance Review

Continued thread

**Radio Azureus** @RadioAzureus@mastodon.social · Aug 20, 2024 *

Aug 20, 2024 *

Radio Azureus @RadioAzureus@mastodon.social

Important quote

On vs. off, the Ryzen 9 9950X impressively gained 56% more performance on average across all benchmarks compared to having #AVX512 acceleration turned off. The 7950X similarly saw a still impressive 41% performance improvement with AVX-512 acceleration turned on vs off

#AMD #Ryzen9000 #Linux

**FCLC** @fclc@mast.hpc.social · Aug 7, 2024

Aug 7, 2024

FCLC @fclc@mast.hpc.social

#zen5 looks like it’ll be Lat 2 on *all* integer SIMD ops

Otherwise looks great for floating point, more to come

#avx512

**Benjamin Carr, Ph.D.** @BenjaminHCCarr@hachyderm.io · Jul 28, 2024

Jul 28, 2024

Benjamin Carr, Ph.D. @BenjaminHCCarr@hachyderm.io

An interview with #AMD's #MikeClark, Father of Zen — 'Zen Daddy' says 3nm #Zen5 is coming fast; also talks compact cores for desktop
AMD expands its Zen 5 architecture. Unlike Intel, which has to reduce clock speeds when its processors run #AVX512 workloads, AMD says these powerful instructions will run at the same clock speeds as standard integer operations. Clark also expanded on how the company achieved that feat and said that its #Zen5c cores can also run full AVX512.
https://www.tomshardware.com/pc-components/cpus/an-interview-with-mike-clark-the-father-of-zen-zen-daddy-talks-fast-3nm-launch-zen-5c-cores-for-desktop-chips

**FCLC** @fclc@mast.hpc.social · Jul 15, 2024

Jul 15, 2024

FCLC @fclc@mast.hpc.social

Turns out @cheese is a real 3D person!

Here he sits down with Mike Clark, chief architect of Zen, to talk about AMDs latest microarchitecture, #zen5

#HPC #x86 #microarchitecture #avx512

From: @chipsandcheese
https://techhub.social/@chipsandcheese/112790635587132915

TechHubChips and Cheese (@chipsandcheese@techhub.social)Hello you fine Internet folks, Today we have a different format for y'all, at AMD's Tech Day I managed to sit down with Mike Clark and have a video interview with him about Zen 5. Hope y'all enjoy! https://youtu.be/YoZ0hP9mkU4 https://chipsandcheese.com/2024/07/15/a-video-interview-with-mike-clark-chief-architect-of-zen-at-amd/

Continued thread

**FCLC** @fclc@mast.hpc.social · Jun 10, 2024

Jun 10, 2024

FCLC @fclc@mast.hpc.social

RISCV extensions but Vulkan

**Jeroen Ruigrok van der Werven** @asmodai@mastodon.social · Apr 6, 2024

Apr 6, 2024

Jeroen Ruigrok van der Werven @asmodai@mastodon.social

AMD Zen 5 Execution Engine Leaked, Features True 512-bit FPU

Giving "Zen 5" a 512-bit FPU meant that AMD also had to scale up the ancillaries [..]. The L1 Data cache has been doubled in bandwidth, and increased in size by 50%. The L1D is now 48 KB in size [..]. FPU MADD latency has been reduced by 1 cycle. Besides the FPU, AMD also increased the number of Integer execution pipes to 10, from 8 on "Zen 4."

https://www.techpowerup.com/321201/amd-zen-5-execution-engine-leaked-features-true-512-bit-fpu

TechPowerUpAMD Zen 5 Execution Engine Leaked, Features True 512-bit FPUAMD "Zen 5" CPU microarchitecture will introduce a significant performance increase for AVX-512 workloads, with some sources reported as high as 40% performance increases over "Zen 4" in benchmarks that use AVX-512. A Moore's Law is Dead report detailing the execution engine of "Zen 5" holds the ans...

#AMD #Zen5 #CPU

**FCLC** @fclc@mast.hpc.social · Mar 23, 2024

Mar 23, 2024

FCLC @fclc@mast.hpc.social

So turns out #Linux 6.6+ doesn’t like when you modify its microcode loading code so all twelve of us running #avx512 on #ADL can run the old MC -.-

**FCLC** @fclc@mast.hpc.social · Feb 12, 2024

Feb 12, 2024

FCLC @fclc@mast.hpc.social

It will never not be funny that the most “powerful” #avx512 instruction, vpternlog, in part came along from needing an instruction to deal with floating point BS and the creation of vfixupimmpN

**Benjamin Carr, Ph.D.** @BenjaminHCCarr@hachyderm.io · Jan 18, 2024

Jan 18, 2024

Benjamin Carr, Ph.D. @BenjaminHCCarr@hachyderm.io

#Benchmarking The Experimental #Ubuntu #x86_64_v3 Build For Greater Performance On Modern #CPU
With x86-64-v3 basically being #Intel #Haswell and #AMD Excavator or newer (with some exceptions like select Atoms), it would be really interesting too if #Canonical would consider an x86-64-v4 option for modern systems with #AVX512 support. It'd be really interesting to see at least an experimental Ubuntu x86-64-v4 build to see what that could mean for #servers and #HPC
https://www.phoronix.com/review/ubuntu-x86-64-v3-benchmark/

**FCLC** @fclc@mast.hpc.social · Jan 5, 2024

Jan 5, 2024

FCLC @fclc@mast.hpc.social

#AVX512 stays winning!

https://masto.ai/@phoronix/111704089965380661

MastodonPhoronix (@phoronix@masto.ai)Attached: 1 image Intel 5th Gen Xeon "Emerald Rapids" AVX-512 Performance With Intel's 5th Gen Xeon Scalable "Emerald Rapids" processors that were released last month, in addition to the power efficiency improvements, faster DDR5 memory support, and other enhancements, one of the other notable enhancements talked up by Intel was improved AVX-512 support. Here are some benchmarks using the flagship Intel Xeon Platinum 8592+ looking at the performance and … https://www.phoronix.com/review/intel-5th-gen-xeon-avx512

Recent searches

Search options

Administered by:

Server stats:

#avx512