I've ran my benchmarks on a few more CPUs out of curiosity, and unlike previous posts I've normalized the throughput by CPU frequency. (this is not *quite* correct to do! for a variety of reasons. but it's interesting.)
On one hand, it's striking how little normalized performance gain there is in Apple Mn. On the other hand, it's striking just how good M1 was and is by this measure.
Single thread, vtx0/1 are SIMD heavy, idx is scalar. M1/v0.22 = version without Apple specific optimizations.