mastodon.gamedev.place is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon server focused on game development and related topics.

Server stats:

5.4K
active users

Arseny Kapoulkine

Fun to see just how good Apple CPUs are in single core performance, especially if you tune the code.

The index codec on Zen4 runs into some perf issues with clang though - don’t think Zen4 should necessarily be slower here. That said, MSVC is unfortunately even worse… we’ll see if these can be fixed by tweaking the code further.

MSVC issue was due to some seriously broken codegen; I was able to work around that thankfully, with perf closer to clang on index decoder now.

Clang has reasonable codegen and turns out gcc is an outlier here, using a surprising scalar to vector promotion that surprisingly saves time by increasing IPC. Not sure I want to replicate that for now…

@zeux microsoft's neglected compiler strikes again

@zeux that is seriously impressive. Do you have some Intel CPU comparison, just for interest?

@aras Don't have any to test unfortunately. This makes me realize I haven't held an Intel CPU powered device in... 5 years?

@aras Okay I ran this on EC2, using Amazon's c7i (Intel Sapphire Rapids) & c7a (AMD Zen4c), results are pretty fun. I've also made a normalized table where I just divide the throughput by the frequency to normalize to 1 GHz (obviously no actual guarantee that's the scaling!)

@aras (in particular for M4 / idx, I doubt it's the case that it's handling this workload worse than M2, and it's more probable that something else like branch prediction offsets the performance more so that it doesn't scale as well with frequency as it could have)