CPU code optimization is always fun. Those statements are all true statements:
- Doing more isn't slower than doing less
- Simpler code isn't faster than complex code
- Executing only one logical branch in an if-else isn't faster than executing both branches and picking the right result later.
- Branching isn't slower than not branching at all.
Premature optimization includes also "obvious" things, because you'll be wrong if you don't benchmark it.
It always depends on the exact thing.
@kdave @karolherbst it is a bit difficult to me to talk to complete strangers about this, but drawing conclusions of how branchy code is going to compare to branchless from llvm-mca does not sound like a good idea
(and while at it, if you like llvm-mca, please let me point you to https://uica.uops.info, it has a great deal of research behind it (see link on top, including a dissertation), and is more accurate)