I tried. It's bad. It *is* faster than permutation tables though.
Bottom center panel is my attempt.
I guess I should bring out a profiler and actually see where the bottlenecks are.
Meanwhile, if someone knows a fast hash that takes 128 bits as input (x: i32, y: i32, seed: u64), works in AVX __mm256i registers and has good entropy in the lower bits, I'm all ears.
#[inline] ALL THE THINGS!
From 82 ms to 34 ms, that's an insane 2.4x speedup for such a trivial change. That'll teach me to trust the compiler to optimize properly.
My implementation now beats most of the others, with the exception of `fastnoise2` (C++) and `simdnoise` (pure Rust). But I have something they don't: tiling
@thomastc wouldn't inlining always speed everything up, but result in a larger executable?
@mark No, because it can cause instruction cache thrashing.
Honestly I was surprised that these functions weren't all inlined automatically by the compiler, because most of them consist of just one call to a compiler intrinsic.
https://doc.rust-lang.org/stable/reference/attributes/codegen.html#the-inline-attribute