@thomastc Again in GLSL, I like to use the pcg4d hash from https://jcgt.org/published/0009/03/02/
Perhaps you could adapt it to SSE on the CPU. Or try applying a little bit-swizzling to the CRC32 input or output.
@scdollins Hmm, interesting. I had seen that paper, but was a bit overwhelmed with the number of options. I think I'd want pcg3d, and pass the seed as the third dimension, and then XOR the three outputs together to get the index into the gradient table?
@scdollins Orrr maybe I could interpret the output as a vec2 of floats and normalize it, doing away with the gradient table altogether.
@scdollins In terms of performance, pcg2d is very promising. Total running time went from 29 ms to 21 ms. I think that would pretty much eliminate the hashing as a bottleneck.
The output still leaves something to be desired though But who's counting?