Solved – Sampling from Empirical CDF for Forecasting

empirical-cumulative-distr-fnforecastingr

I'm trying to forecast the future distribution of a particular interest rate based on its quarterly percentage changes. My assumptions are that:

The observations are independent
The distribution holds across time (stationarity of the quarterly percentage changes)

When I run Shapiro / K-S tests of normality on my historical data, I find very strong evidence in favor of rejecting the null hypothesis that both types of change my data could have been generated from a normal distribution, so I want to forecast based on the empirical distribution.

My questions are:

Is there any way to determine whether or not using the empirical distribution gives a better estimate than using a normal distribution?
I'm using $\textsf{R}$'s sample(x, size) command to generate potential paths for MC simulation — is this the "right" way to sample from the empirical distribution? Are there issues I'm failing to consider properly since the empirical distribution is discrete?

Many thanks.

Best Answer

As pointed out by the other poster, you cannot treat time series data as a simple random sample due to the correlations between adjacent observations in time. A nice nonparametric approach to generating sample paths is the block boostrap and here http://nccur.lib.nccu.edu.tw/bitstream/140.119/35143/6/51007106.pdf

Note that the first link also points you to the handy tsboot package in R.

Related Solutions

Solved – Dividing and forecasting a normal distribution

The straight answer to Q1 is "yes", it is definitely possible to cut up an underlying normally distributed continuous variable into an ordinal variable with 1 to 10 levels. You need something that can tell you the cumulative distribution function (often called CDF) of a normal distribution with a given mean and variance (you only need these two parameters to characterise a normal distribution). Then you need to calculate the differences between the values this returns for your various bin cutoffs (as its straight return will be the cumulative probability of a value at X or lower).

I'm sorry I don't use C# but in R this would be something like the below. This is for a 10 point example, if the normal distribution you think is your underlying latent variable has a mean of 5 and variance of 2; and my bins are minus infinity to 1.5, 1.5 to 2.5, 2.5 to 3.5, ... , 9.5 to infinity. You only need the mean and variance to characterise a normal distribution.

> options(digits=2)
> x <- pnorm(1:10+0.5, 5, 2)*100
> x[10] <- 100            # otherwise is just 9.5 to 10.5, not infinity
> x                       # ie cumulative prob (in %) to each bin
 [1]   4  11  23  40  60  77  89  96  99 100    
> c(x[1], diff(x))        # differences between the cumulative probs
 [1]  4.0  6.6 12.1 17.5 19.7 17.5 12.1  6.6  2.8  1.2

Subsequently, the straight answer to Q2 is also "yes" there are definitely such methods but they should be used with caution and it is probably a little difficult just here to summarise all the pros and cons of the different ways of doing this.

It's also worth knowing that there are other methods for analysing this sort of ordinal data.

Best Answer

Related Solutions

Solved – Dividing and forecasting a normal distribution

Related Question