Time Series – How to Generate Uniformly Distributed Samples with a Given Autocorrelation Function

autocorrelationrandom-generationsamplingtime seriesuniform distribution

As I mentioned in the question title, I want to generate specific uniformly distributed samples.

I need them to model a real world scenario. For my real data, I estimated a function, which approximates the auto-correlation function by a e-function. I also modeled a distribution model for my real data.

I can generate samples, which follow my distribution model with the inverse transform sampling.

If i think correct, instead of "standard" uniform distributed samples, which i feed into the inverse transform sampling, i need to adapt those uniform distributed samples, so that their auto-correlation function follows my estimated e-function.

Now the question is, if i am correct, how to adapt my uniform distributed samples, that they correspond to my requirements.

It would be great if, somebody could help.

EDIT 1:

Here is a plot of the auto-correlation function of my real data (different scenarios) and the modeled function (green). The ACF of my uniform distributed samples should follow the green (respectively blue) line.

EDIT 2:

Thank you so much so far for the answers. To make my problem a bit clearer, i added a illustration of the problem. For a simulation, i need to generate samples, which follow my distribution model (with inverse transform sampling a easy task) AND have the same auto-correlation behavior like the real data.

My idea was, that i need to adapt the uniformly distributed samples, which i feed into the inverse transform sampling, so that their ACF follows my model.

The problem is, that i don't know if this idea is correct and that i don't know how to adapt the uniformly distributed samples.

EDIT 3:
Following the approach from jblood94, the solution is based on a reordering of the output samples after the inverse transform sampling. So, the underlying distribution stays the same AND the variables follow the desired acf.

I tried to translate the R code from jblood94 to matlab:

clear;
close all;
clc;


x = gamrnd(0.9, 1, 1e6, 1);
alpha = [1, sort(betarnd(1, 3, 1, 50), 'descend')];

tic
xx = acf_reorder(x, alpha);
toc

figure;
autocorr(xx, 'NumLags', length(alpha)-1);
hold on;
plot((1:length(alpha))-1, alpha);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function xx = acf_reorder(x, alpha)

tol = 1e-5;
maxIter = 10;
n = length(x);
xx = sort(x);
y = randn(n,1);
[w0, w, alpha1] = deal(alpha);
m = length(alpha);
m1 = 1:(m-1);
tol10 = tol/10;
iter = 0;
x = xx(tiedrank(filter(w, 1, y)));
SSE0 = inf;

acfInit = autocorr(x, 'NumLags', m-1)';
SSE = sum((acfInit(1:m)-alpha).^2);

while SSE > tol
    if(SSE < SSE0)
        SSE0 = SSE;
        w = w0;
        iter = iter + 1;
        if(iter == maxIter)
            break;
        end
    else
        break;
    end

    w1 = w0;
    a = 0;
    sse0 = inf;

    while (max(abs(alpha1 - a)) > tol10)
        a = [1, arrayfun(@(x) sum(head(w1, -x).*tail(w1, -x)), m1)/sum(w1.^2)];

        sse = sum((a - alpha1).^2);
        if (sse < sse0)
            sse0 = sse;
            w0 = w1;
        else
            % w0 failed to converge; try optim
            disp('w0 failed to converge; try optim');
            w1 = exp(fminsearch(@fun, log(w0), [], m1));

            a = [1, arrayfun(@(x) sum(head(w1, -x).*tail(w1, -x)), m1)/sum(w1.^2)];
            if (sum((a - alpha1).^2) < sse0)
                w0 = w1;
                break;
            end
        end

        w1 = (w1.*alpha1./a + w1)/2;
    end

    x = xx(tiedrank(filter(w0, 1, y)));
    acf = autocorr(x, 'NumLags', m-1);
    alpha1 = (alpha1.*alpha./acf(1:m) + alpha1)/2;

    SSE = sum((acf(1:m)-alpha).^2);
end

xx = xx(tiedrank(filter(w, 1, y)));

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function output = fun(ww, m1)

ww = exp(ww);
output = sum([1, arrayfun(@(x) sum(head(ww, -x).*tail(ww, -x)), m1)/sum(ww.^2)]);
end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function dataOut = head(dataIn, num)

dataIn = dataIn(:);

if num>0
    dataOut = dataIn(1:num, :);
elseif num<0
    dataOut = dataIn(1:(length(dataIn)+num));
end

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function dataOut = tail(dataIn, num)

dataIn = dataIn(:);

if num>0
    dataOut = dataIn((length(dataIn) - num + 1):length(dataIn),:);
elseif num<0
    dataOut = dataIn((-num+1):length(dataIn),:);
end

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

This yields SOMETIMES plots like this, and sometimes it's trapped in the inner while loop (I need to fix this). As you can see, the plot shows, that the result is not ideal.

EDIT 4:

Maybe I didn't formulate my problem correctly at first: I want to generate a time series that follows a defined distribution (in my case approximately Half-Gaussian) and whose autocorrelation function follows a given autocorrelation function (e.g. up to lag 30). So I have two properties that my time series must fulfil.

I tried the following approaches (without success):

generation of uniformly distributed samples
invese transform sampling with desired distribution
Reordering the samples to match the ACF
Failed: it takes far too long for my time series (1e6 samples) + acf could not be reached.

generation of normally distributed samples with desired autocorrelation (with the approach from jblood94)
invese transform sampling with uniform distribution (based on this solution)
invese transform sampling with desired distribution
Failed: the autocorrelation of the transformed samples does not correspond to the desired Acf, which was correct before the transformation.

I think approach 2 might work. However, it must be prevented that the autocorrelation is changed by the transformation. Or the initial autocorrelation must be adjusted in such a way that it matches the desired Acf again, at least after the transformation.

Now the question is how the initial autocorrelation needs to be adjusted. Or is the approach correct? First generate normally distributed samples with autocorrelation, transform them into normally distributed samples, and then transform them into my desired distribution?

In addition, so far I have only managed to change individual lags of the normal distribution, but not that lag 1-30, for example, follow my function.

EDIT 5:

I tested approach 2 again. It seems, that the change in the autocorrelation is dependent from the acf itself. I generated samples, which follow now my distribution and my acf. I will post plots later.

This approach seems to work for me, but I can't guarantee, that it works in general.

Best Answer

This problem can be approached by first generating samples from the desired distribution and then reordering them to match the desired autocorrelation function.

The R code below demonstrates an approach based on this answer that can be modified for any desired ACF and distribution. The example generates $n=10^6$ samples from $\text{Gamma}(0.9,1)$ whose ACF follows 50 random samples from $\text{Beta}(1,3)$, sorted descending.

The process is as follows.

Generate the desired number of samples, $X=\{x_1,...,x_n\}$, from the target distribution. Set $\alpha_0$ equal to the desired ACF. Initialize the target ACF, $\alpha$, as the desired ACF.
Find a set of weights that, when passed to filter along with $n$ random normal variates, results in a series, $Y$, with $ACF=\alpha$ (see the answer linked above).
Reorder $X$ so that its rank ordering matches that of $Y$. If $X$ is normally distributed, the resulting series should have the desired ACF; however, the more $X$ deviates from normality, the more the ACF will deviate from $\alpha_0$ (the example below has a target distribution of $\text{Gamma}(0.9,1)$, which is very "non-normal"). Update the target ACF, $\alpha$, according to $\alpha'=\frac{\alpha}{2}\Big(\frac{\alpha_0}{ACF}+1\Big)$ and repeat steps 1-3 until the ACF of the reordered $X$ converges.

The function that performs the reordering (it works only for positive values for alpha):

acf.reorder <- function(x, alpha) {
  tol <- 1e-5
  maxIter <- 10L
  n <- length(x)
  xx <- sort(x)
  y <- rnorm(n)
  w0 <- w <- alpha1 <- alpha
  m <- length(alpha)
  i1 <- sequence((m - 1):1)
  i2 <- sequence((m - 1):1, 2:m)
  i3 <- cumsum((m - 1):1)
  tol10 <- tol/10
  iter <- 0L
  x <- xx[rank(filter(y, w, circular = TRUE))]
  SSE0 <- Inf
  f <- function(ww) {
    sum((c(1, diff(c(0, cumsum(ww[i1]*(ww[i2]))[i3]))/sum(ww^2)) - alpha1)^2)
  }
  ACF <- function(x) acf(x, lag.max = m - 1, plot = FALSE)$acf[1:m]
  
  while ((SSE <- sum((ACF(x) - alpha)^2)) > tol) {
    if (SSE < SSE0) {
      SSE0 <- SSE
      w <- w0
    }
    if ((iter <- iter + 1L) == maxIter) break
    w1 <- w0
    a <- 0
    sse0 <- Inf
    
    while (max(abs(alpha1 - a)) > tol10) {
      a <- c(1, diff(c(0, cumsum(w1[i1]*(w1[i2]))[i3]))/sum(w1^2))
      
      if ((sse <- sum((a - alpha1)^2)) < sse0) {
        sse0 <- sse
        w0 <- w1
      } else {
        # w0 failed to converge; try optim
        w1 <- optim(w0, f, method = "L-BFGS-B")$par
        a <- c(1, diff(c(0, cumsum(w1[i1]*(w1[i2]))[i3]))/sum(w1^2))
        if (sum((a - alpha1)^2) < sse0) w0 <- w1
        break
      }
      
      w1 <- (w1*alpha1/a + w1)/2
    }
    
    x <- xx[rank(filter(y, w0, circular = TRUE))]
    alpha1 <- (alpha1*alpha/ACF(x) + alpha1)/2
  }
  
  xx[rank(filter(y, w, circular = TRUE))]
}

Generate samples from the target distribution and specify the desired ACF:

set.seed(1960841256)
x <- rgamma(1e6, 0.9, 1)
alpha <- c(1, sort(rbeta(50, 1, 3), TRUE))

Reorder x and plot its ACF against alpha:

system.time(x <- acf.reorder(x, alpha))
#>    user  system elapsed 
#>    7.13    0.41    7.53
acf(x, lag.max = length(alpha) - 1)
lines(seq_along(alpha) - 1, alpha, col = "green")

The resulting ACF is a good match with the target, and since x has simply been reordered, it is known to have the desired distribution.

Update: Worst of 100 iterations

The original code was breaking too early for some cases as noted by the OP in the updated question. The above code has been modified to correct the behavior. I also removed the restriction that the ACF be strictly positive.

I ran the same procedure 100 times, each with a new random vector for x and alpha. The longest-running iteration took 7.36 seconds, and the worst performing iteration (by maximum absolute difference of the achieved vs. desired ACF) had a maximum absolute error of 0.056. It is plotted below.

Best Answer

Update: Worst of 100 iterations

Related Solutions

Random Generation – Generating Random Matrices with Constraints on Row and Column Length

Solved – Why can we assume a single sample is normally distributed

Related Question