Random – Pseudo Random Number Generation on the GPU

random

Idea

For a Path Tracing application I need to generate good quality pseudo random numbers in the closed range [0~1]. Because I'm doing this on the GPU (HLSL Shader Model 5) I can only use 32bit variables. My initial approach is the following set-up:

  1. Ever Frame the pixel shader receive a good pseudo random number ([0~1]) from the CPU using C++'s std::mt19937 generator and std::uniform_real_distribution.

  2. Because for each pixel this number is the same I also use the screen coordinates u and v of each pixel these are also in [0~1].

  3. I then call the Multiply With Carry method like below.

Algorithm

// seed is the value given from the cpu
float3 random = Random(seed + u, seed + v)l

// Multiply With Carry, returns 3 floating point values {x, y, z}
// x: the random number
// y, z: new seeds for the next time we need a random number
float3 Random(float seed_a, float seed_b)
{   
    uint m_z = asuint(seed_a);
    uint m_w = asuint(seed_b);

    m_z = 36969 * (m_z & 65535) + (m_z >> 16);
    m_w = 18000 * (m_w & 65535) + (m_w >> 16);  

    float r = ((m_z << 16) + m_w) / (float)0xFFFFFFFF;
    return float3(r, asfloat(m_z), asfloat(m_w));
}

This produces the following output. (The left part is the random number obtained from the Random method for this pixel, the right part is the visualization of u and v as Red and Green.

Result

Not so random

As you can see there is clearly a pattern, so the randomness is not good at all. Which hurts the performance of my algorithm tremendously. This is probably due to the fact that the original Multiply with Carry method assumes m_z and m_w are 64 bit integers, not 32 bit ones.

What I want

What I'm looking for is a way to fix my implementation of the Multiply with Carry method so that it produces reasonably good pseudo random numbers and works in the closed [0~1] interval instead of the open [0~1] interval. However since it is quite possible that this method can only work right using 64 bits integers I'm would also be really glad if someone can suggest another pseudo random number generator algorithm that:

  • Works with 32 bit numbers
  • Produces uniformly distributed results in the closed [0~1] interval
  • Does not require too much state information, (this is why I chose MwC since it only needs to store 2 variables) since that is hard on the GPU. 16 32 bit variables would be the maximum I think since I can store that in 4×4 matrix which is easy to pass around.

Best Answer

I queried fellow students and they brought my attention to this paper: https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch37.html

Since the paper is quite long and might not be online forever below is the general idea:

Linear Congruent Generators are ideal for use on the GPU because they are simple and do not require much state (only the previously generated number). but they are not 'random enough' for, for example, Monte Carlo based simulation. A generator like a Mersenne Twister would be better but requires way too much state to be stored.

The solution proposed by the paper is combining several LCGs using a combined Tausworthe Generator (as used by the Mersenne Twister) this guarantees a much better randomness without having to store as much state as the Mersenne Twister. The final algorithm looks like this:

struct RandomResult
{
    uint4 state;
    float value;
};

uint TausStep(uint z, int S1, int S2, int S3, uint M)
{
    uint b = (((z << S1) ^ z) >> S2);
    return (((z & M) << S3) ^ b);    
}

uint LCGStep(uint z, uint A, uint C)
{
    return (A * z + C);    
}

RandomResult Random(uint4 state)
{
    state.x = TausStep(state.x, 13, 19, 12, 4294967294);
    state.y = TausStep(state.y, 2, 25, 4, 4294967288);
    state.z = TausStep(state.z, 3, 11, 17, 4294967280);
    state.w = LCGStep(state.w, 1664525, 1013904223);

    RandomResult result;
    result.state = state;
    result.value = 2.3283064365387e-10 * (state.x ^ state.y ^ state.z ^ state.w);

    return result;
}

Note that the initial seed values for state should be larger than 128! (For background information reed the paper) and that you should fill the seed with 4 good random numbers from the CPU + four values unique for that pixel to get a nice result.

Related Question