Hat Matrix and Leverages – Classical Multiple Regression Techniques

leveragemultiple regressionreferencesregressionself-study

What is Hat matrix and leverages in classical multiple regression? What are their roles? And Why do use them?

Please explain them or give satisfactory book/ article references to understand them.

Best Answer

The hat matrix, $\bf H$, is the projection matrix that expresses the values of the observations in the independent variable, $\bf y$, in terms of the linear combinations of the column vectors of the model matrix, $\bf X$, which contains the observations for each of the multiple variables you are regressing on.

Naturally, $\bf y$ will typically not lie in the column space of $\bf X$ and there will be a difference between this projection, $\bf \hat Y$, and the actual values of $\bf Y$. This difference is the residual or $\bf \varepsilon=Y-X\beta$:

enter image description here

The estimated coefficients, $\bf \hat\beta_i$ are geometrically understood as the linear combination of the column vectors (observations on variables $\bf x_i$) necessary to produce the projected vector $\bf \hat Y$. We have that $\bf H\,Y = \hat Y$; hence the mnemonic, "the H puts the hat on the y."

The hat matrix is calculated as: $\bf H = X (X^TX)^{-1}X^T$.

And the estimated $\bf \hat\beta_i$ coefficients will naturally be calculated as $\bf (X^TX)^{-1}X^T$.

Each point of the data set tries to pull the ordinary least squares (OLS) line towards itself. However, the points farther away at the extreme of the regressor values will have more leverage. Here is an example of an extremely asymptotic point (in red) really pulling the regression line away from what would be a more logical fit:

enter image description here

So, where is the connection between these two concepts: The leverage score of a particular row or observation in the dataset will be found in the corresponding entry in the diagonal of the hat matrix. So for observation $i$ the leverage score will be found in $\bf H_{ii}$. This entry in the hat matrix will have a direct influence on the way entry $y_i$ will result in $\hat y_i$ ( high-leverage of the $i\text{-th}$ observation $y_i$ in determining its own prediction value $\hat y_i$):

enter image description here

Since the hat matrix is a projection matrix, its eigenvalues are $0$ and $1$. It follows then that the trace (sum of diagonal elements - in this case sum of $1$'s) will be the rank of the column space, while there'll be as many zeros as the dimension of the null space. Hence, the values in the diagonal of the hat matrix will be less than one (trace = sum eigenvalues), and an entry will be considered to have high leverage if $>2\sum_{i=1}^{n}h_{ii}/n$ with $n$ being the number of rows.

The leverage of an outlier data point in the model matrix can also be manually calculated as one minus the ratio of the residual for the outlier when the actual outlier is included in the OLS model over the residual for the same point when the fitted curve is calculated without including the row corresponding to the outlier: $$Leverage = 1-\frac{\text{residual OLS with outlier}}{\text{residual OLS without outlier}}$$ In R the function hatvalues() returns this values for every point.

Using the first data point in the dataset {mtcars} in R:

    fit = lm(mpg ~ wt, mtcars) # OLS including all points
    X = model.matrix(fit) # X model matrix
    hat_matrix = X%*%(solve(t(X)%*%X)%*%t(X)) # Hat matrix
    diag(hat_matrix)[1] # First diagonal point in Hat matrix
    fitwithout1 = lm(mpg ~ wt, mtcars[-1,]) # OLS excluding first data point.
    new = data.frame(wt=mtcars[1,'wt']) # Predicting y hat in this OLS w/o first point.
    y_hat_without = predict(fitwithout1, newdata=new) # ... here it is.
    residuals(fit)[1] # The residual when OLS includes data point.
    lev = 1 - (residuals(fit)[1]/(mtcars[1,'mpg'] -  y_hat_without)) # Leverage
    all.equal(diag(hat_matrix)[1],lev) #TRUE
Related Question