Back

I tried. It's bad. It *is* faster than permutation tables though.

Bottom center panel is my attempt.

A grid of 3x3 images of simplex noise. The one at the bottom center has a repeating pattern.

**thomastc | frozenfractal** @thomastc · Mar 18

AES instructions don't fare any better.

A similar image as before, but with a different repeating pattern.

**thomastc | frozenfractal** @thomastc · Mar 18

I guess I should bring out a profiler and actually see where the bottlenecks are.

Meanwhile, if someone knows a fast hash that takes 128 bits as input (x: i32, y: i32, seed: u64), works in AVX __mm256i registers and has good entropy in the lower bits, I'm all ears.

**Marcus Müller** @funkylab@mastodon.social · Mar 18

@thomastc what needs to be the output width of your hash? Otherwise, XOROSHIRO128+ / XOROSHIRO128++ have 128 bit state, generate 64 bit high-entropy pseudo-random numbers, and can very heavily be optimized. I suspect you don't need a single random number at a time, and there's AVX2 implementations out there that can generate four 64 bit values at a time.

**Marcus Müller** @funkylab@mastodon.social · Mar 18

@thomastc (note that in your "hash" use case, you don't simply return the sum of the lower and upper half of the 128 bits of state, but first do the state modification, i.e.,
```
uint64 state[2] = {x << 32 | y, SEED};
uint64_t tmp = state[1] ^ state[0];
return (rot_left(state[0], 55) ^ tmp ^ (tmp<<14)) + rot_left(tmp 36);
```
with `rot_left(x, k)` simply being `(x<<k) | (x >> (64-k))`.

**thomastc | frozenfractal** @thomastc · Mar 19

@funkylab Thanks! I'm already tied closely to 32-bit numbers because I'm using __mm256i AVX registers with 8 lanes. Splitting up all the 64-bit operations would probably negate any performance gain.

I could instead try xoshiro128+ which has a 128-bits state (conveniently equal to my input size) and is specifically targeted at generating floats. https://prng.di.unimi.it/xoshiro128plus.c

I need two floats, so I could use s[0]+s[3] for one, and s[1]+s[2] for the other.

**Marcus Müller** @funkylab@mastodon.social · Mar 19 *

Mar 19 *

@thomastc what I coded above **is** Xoroshiro128+. The state *has* 128 bit (2× uint64_t, which I just wrote that way for ease of understanding).

**Marcus Müller** @funkylab@mastodon.social · Mar 19 *

Mar 19 *

@thomastc and as said, there's AVX2 impls of it, which do 4× generations (of 64 bit each) at once. The world is good :) (the ops don't need many _mm256 regs, so the intermediate steps should not be a problem – quite the contrary, if your CPU is dual issue on these ops, you get that for free)

**thomastc | frozenfractal** @thomastc · Mar 19

@funkylab Huh, I don't know why I thought you wrote 256. Been staring too much at powers of 2 lately

In any case, it turns out I need to run about 10 rounds of this before the output appears sufficiently random. Which makes sense, because adjacent grid points only differ by 1 bit, and the hash doesn't have enough avalanching going on to make that 1 bit affect everything.

And if I have to do 10 rounds, it turns out PCG2D wins the performance battle hands down.

**Marcus Müller** @funkylab@mastodon.social · Mar 19

thomastc | frozenfractal @thomastc@mastodon.gamedev.place

@thomastc that must be wrong. The entropy of xoroshiro128+ is quite good, like, measured in large-scale statistic terms. Do you perhaps forgo to update what you call "seed" in your original question from the value you generate? The code snipped I sent was explicitly just for a single value :)

@funkylab Here's what I'm seeing: https://godbolt.org/z/rd349MqWf Those two output numbers are way too similar for my purposes. Am I doing something wrong?

godbolt.orgCompiler Explorer - C++ uint64_t rot_left(const uint64_t x, int k) { return (x << k) | (x >> (64 - k)); } int32_t hash(int32_t x, int32_t y, uint64_t seed) { uint64_t state[2] = {uint64_t(uint32_t(x)) << 32 | uint32_t(y), seed}; uint64_t tmp = state[1] ^ state[0]; return (rot_left(state[0], 55) ^ tmp ^ (tmp<<14)) + rot_left(tmp, 36); } int main() { uint64_t seed = 14131954677241054293ull; printf("These are way too similar:\n"); printf("%08x\n", hash(0, 0, seed)); printf("%08x\n", hash(1, 0, seed)); printf("And it doesn't get better farther away from the origin:\n"); printf("%08x\n", hash(3891095228, 2465519915, seed)); printf("%08x\n", hash(3891095229, 2465519915, seed)); }

Mar 19, 2025, 02:34 PM·

0boosts·1favorite

**Marcus Müller** @funkylab@mastodon.social · Mar 19

godbolt.orgCompiler Explorer - C++ (x86-64 gcc 14.2) uint32_t hash(int32_t x, int32_t y, uint64_t seed) { // based on splitmix64, but do let x << k overlap with y, to resolve local // similarity uint64_t z = ((uint64_t(x) << 30) | y) | (seed + 0x9e3779b97f4a7c15); z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9; z = (z ^ (z >> 27)) * 0x94d049bb133111eb; return z ^ (z >> 31); } // wish we had std::simd std::array<uint32_t, 4> hash(std::array<int32_t, 4> x, std::array<int32_t, 4> y, uint64_t seed) { std::array<uint64_t, 4> z; for (char i = 0; i < 4; ++i) { z[i] = ((uint64_t(x[i]) << 30) | y[i]) | (seed + 0x9e3779b97f4a7c15); } for (char i = 0; i < 4; ++i) { z[i] = (z[i] ^ (z[i] >> 30)) * 0xbf58476d1ce4e5b9; } for (char i = 0; i < 4; ++i) { z[i] = (z[i] ^ (z[i] >> 27)) * 0x94d049bb133111eb; } for (char i = 0; i < 4; ++i) { z[i] ^= z[i] >> 31; } return {static_cast<uint32_t>(z[0]), static_cast<uint32_t>(z[1]), static_cast<uint32_t>(z[2]), static_cast<uint32_t>(z[3])}; } int main() { uint64_t seed = 14131954677241054293ull; std::print("These are way too similar:\n"); std::print("{0:032b}|{0:08x}\n", hash(0, 0, seed)); std::print("{0:032b}|{0:08x}\n", hash(1, 0, seed)); std::print("And it doesn't get better farther away from the origin:\n"); std::print("{0:032b}|{0:08x}\n", hash(3891095228, 2465519915, seed)); std::print("{0:032b}|{0:08x}\n", hash(3891095229, 2465519915, seed)); std::print("{:—<50}\n", ""); // more at once std::array<int32_t, 4> x = {0, 1, static_cast<int32_t>(3891095228), static_cast<int32_t>(3891095229)}; std::array<int32_t, 4> y = {0, 0, static_cast<int32_t>(2465519915), static_cast<int32_t>(2465519915)}; auto res = hash(x, y, seed); std::print("{0:032b}|{0:08x}\n", res[0]); std::print("{0:032b}|{0:08x}\n", res[1]); std::print("{0:032b}|{0:08x}\n", res[2]); std::print("{0:032b}|{0:08x}\n", res[3]); }

@thomastc aaaah now I understand what you're trying to achieve!
Maybe then a simple splitmix64 is nicer for you:
https://godbolt.org/z/b6Yx4fWGs
Note that you could shorten the constants considering you're only keeping the lower 32 bit. Also note that I'm disappointed in GCC missing the chance to put all four array elements in the vector version into the same 256 bit SIMD register automatically.

**thomastc | frozenfractal** @thomastc · Mar 19