Solved – Confidence Interval / Best-fit / Prediction Interval

confidence interval

Before you begin reading, I want to say thank you for helping, or attempting to help me. I really appreciate any help you can give me! Also, warning: Wall of Text approaching fast.

I've been informed that I am misusing the term Confidence Interval. I have been told that what I am actually looking for is the "Prediction Interval". I'm not sure though as I know my coworker who assigned me this task said Confidence Interval.

Objectives:

Quantify the variability of exchange rates between system #1 and system #2
Determine bestfit distribution Find
95% confidence interval

Please note: I am doing this in excel, so I will be using excel functions to calculate stdev ect. But if you aren't familiar with excel you can still be of help to me! The concepts are what I need help with!

Here an explanation of the data I was given and what I have done so far:
System #1 and #2 BOTH have two lists of 48 numbers. One list being the closing market price in USD and the other list the closing market price in the native currency.
So I have "Sys#1 USD", "Sys#1 Native", "Sys#2 USD" and "Sys#2 Native" columns of 48 values each. System #1 is the actual closing mkt price, while System #2 is the one we are testing to see how much it differs from the correct values.

(1.)
I found the exchange rate to the dollar for each system by simply dividing Native/USD for the corresponding system.

(#1Native/#1USD) = (#1Native/#1 $1 USD) and (#2Native/#2USD) = (#2Native/#2 $1 USD)         `$`

(2.)
I then found the percent error of the foreign exchange rates of System 2 compared to System 1.

[ ( (abs[(#2Native/#2 $ 1USD) - (#1Native/ #1 $ 1USD)]) / (#1Native/ #1 $1USD) ] * 100

(3.)
I proceeded to find the standard deviation and mean using the simple functions excel comes equipped with. Excel functions below.

=STDEV(values)
=AVERAGE(values)

(4.)
I was informed that using the =CONFIDENCE function in excel was actually NOT what I want because it calculated the CI with the true mean of all future data, and I do not know the true mean value of all future data, only of my sample of 48 days. I was told to use the =NORMINV(probability,mean,standard_dev) function by my coworker. To my understanding, this method "fits a normal distribution to the data and then makes a prediction assuming that this fit is correct."
I'm not sure if my data is a normal distribution, so do not know if I can use =NORMINV?

So basically, how do I calculate a 95% confidence interval of this data and determine the best fit distribution? Should I be using =NORMINV?

Thank you so much for your help!

Best Answer

You should read Spreadsheet Adiction and the links from that page before trusting any results from Excel.

From your question it appears that you don't have a firm grasp on what confidence intervals and prediction intervals are. You should really consult a good intro stats book, and/or take a class or meet with a consultant to get these concepts down. But here is a short explanation:

The condifence interval is a statement about where we believe the true population parameter (the mean above) to be based on the sample data. So not knowing the population mean does not mean that you cannot do a confidence interval. If your sample is large and you are willing to assume that the population is not overly skewed or would produce outliers, then the Central Limit Theorem says that a confidence interval on the mean based on the assumption of a normal population will be a good approximation even if the population is not normal. So you can use normal based theory without knowing if the population is normal as long as you are willing to make the above assumptions.

The prediction interval is a statement about where we expect future individual data points to be. This prediction will depend much more on the shape of the distribution.

The big difference in concept is whether you are talking about the mean of all future data, or individual data points (I could not tell which you are interested in from the question).

The norminv function in Excel does not fit a normal distribution, but gives the x-value for a given area under the curve (probability) for a normal with the specified mean and standard deviation. That function could be used as part of the computations to get either of the intervals, but that assumes that you know the population standard deviation, if you are using the sample standard deviation then it is more appropriate to use the t distribution rather than the normal. Also note that the prediction interval takes into account the uncertainty in you estimate of the mean and standard deviation in addition to the randomness of the individual data points, so norminv probably is not what you want.

Related Solutions

Solved – How to a 95% confidence interval not overlap with the trendline forecast

You derive your confidence intervals from errors that you simulate from your optimized minimum error and by assuming the mean of erros to be zero. If your confidence interval is not conforming to your forecast line then this could either be due to non-zero mean or be due to non-normal distributions of errors in the past and hence in the future also. Check whether your past errors were normal or not. I believe if u are using Holt Winters' then you already would have been correctling for autocorrelation. So besides that, the only thing I can figure out is non-normal distribution of errors which may demand then to try non-linear base model than linear ones. In which case you would have to change your one step forecast formula from linear function to non-linear function.

P.S. I am assuming there are no coding errors, for which I am not an expert.

Solved – Bootstrapping confidence interval from a regression prediction

Bootstrapping refers to resample your data with replacement. That is, instead of fitting your model to the original X and y, you fit your model to resampled versions of X and y for multiple times.

Thus, you get n slightly different models which you can use to create a confidence interval. Here is a visual example of such an interval.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Create toy data 
x = np.linspace(0, 10, 20)
y = x + (np.random.rand(len(x)) * 10)

# Extend x data to contain another row vector of 1s
X = np.vstack([x, np.ones(len(x))]).T

plt.figure(figsize=(12,8))
for i in range(0, 500):
    sample_index = np.random.choice(range(0, len(y)), len(y))

    X_samples = X[sample_index]
    y_samples = y[sample_index]    

    lr = LinearRegression()
    lr.fit(X_samples, y_samples)
    plt.plot(x, lr.predict(X), color='grey', alpha=0.2, zorder=1)

plt.scatter(x,y, marker='o', color='orange', zorder=4)

lr = LinearRegression()
lr.fit(X, y)
plt.plot(x, lr.predict(X), color='red', zorder=5)

Best Answer

Related Solutions

Solved – How to a 95% confidence interval not overlap with the trendline forecast

Solved – Bootstrapping confidence interval from a regression prediction

Related Question