Finding intercept term in fixed effects model

fixed-effects-modelpanel dataplmrregression

I am using the plm() function in R with a fixed effects model:

fixed <- plm(Y ~ X, data = pdata, index = c("sireID", "cropNum"), model = "within")

The output is then:

> summary(fixed)
Oneway (individual) effect Within Model

Call:
plm(formula = Y ~ X, data = pdata, model = "within")

Unbalanced Panel: n = 256, T = 1-16, N = 1039

Residuals:
    Min.  1st Qu.   Median  3rd Qu.     Max. 
-12.2822  -2.1441   0.0000   1.8906  32.6836 

Coefficients:
   Estimate Std. Error t-value Pr(>|t|)    
X1 0.643017   0.042771 15.0340   <2e-16 ***
X2 1.945723   1.745879  1.1145   0.2654    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    17203
Residual Sum of Squares: 13330
R-Squared:      0.2251
Adj. R-Squared: -0.029887
F-statistic: 113.439 on 2 and 781 DF, p-value: < 2.22e-16

I want to obtain the intercept for this fixed effects model. Is this possible to do using plm in R? I tried using the solution given here by adding effect = "twoways" to the plm() function but this didn't produce an intercept term.

Best Answer

The fixed effects model your estimating is akin to estimating a separate intercept for each sireID. The unit-specific intercepts don't appear in your summary output, but you can ask for them with the following function: fixef(fixed).

Estimating fixed effects along the other dimension (e.g., effect = "twoways) isn't going to return a global intercept. The main takeaway from the post you referenced is that the overall intercept $\alpha$ is perfectly collinear with $\alpha_i$, as the sum of all the unit-specific effects (i.e., $\alpha_i$'s) is one.

Estimating your equation using lm() and including dummies for all units will return an intercept, but the intercept is simply the effect for the omitted dummy. To see this in practice, I will use the Produc dataset (U.S. state production from 1970 to 1986) from the plm package.

# Load the plm package and the Produc dataset

library(plm)
data("Produc", package = "plm")

# Estimate a one-way fixed effects model
# Extract the state-specific intercepts

mod_within <- plm(hwy ~ gsp, index = c("state", "year"), model = "within", effect = "individual", data = Produc)
summary(fixef(mod_within))

The fixef() function extracts the fixed effects from a plm object. The output below reports the unique state-specific intercepts. Remember the effect for Alabama in the first row.

# Summary output from the fixef() function
# These are the state-specific intercepts...

             Estimate Std. Error t-value  Pr(>|t|)    
ALABAMA         7390.71     185.97  39.742 < 2.2e-16 ***
ARIZONA         4749.54     180.96  26.246 < 2.2e-16 ***
ARKANSAS        3761.83     175.90  21.387 < 2.2e-16 ***
CALIFORNIA     37091.91     697.28  53.195 < 2.2e-16 ***
COLORADO        4501.00     186.95  24.076 < 2.2e-16 ***
CONNECTICUT     6142.66     192.52  31.907 < 2.2e-16 ***
DELAWARE        1897.09     171.51  11.061 < 2.2e-16 ***
FLORIDA        13175.73     266.92  49.362 < 2.2e-16 ***
GEORGIA         8331.90     208.42  39.977 < 2.2e-16 ***
IDAHO           2369.09     171.82  13.789 < 2.2e-16 ***
ILLINOIS       22536.50     360.88  62.449 < 2.2e-16 ***
INDIANA         8559.26     213.81  40.033 < 2.2e-16 ***
IOWA            8951.42     184.49  48.520 < 2.2e-16 ***
KANSAS          5983.13     181.48  32.969 < 2.2e-16 ***
KENTUCKY        9925.08     187.95  52.808 < 2.2e-16 ***
LOUISIANA       9972.78     217.73  45.804 < 2.2e-16 ***
MAINE           2205.62     172.15  12.812 < 2.2e-16 ***
MARYLAND        8259.07     198.04  41.704 < 2.2e-16 ***
MASSACHUSETTS   7203.77     225.36  31.965 < 2.2e-16 ***
MICHIGAN       16473.75     288.12  57.178 < 2.2e-16 ***
MINNESOTA      10479.91     199.22  52.604 < 2.2e-16 ***
MISSISSIPPI     5435.86     176.62  30.777 < 2.2e-16 ***
MISSOURI        9406.42     206.89  45.465 < 2.2e-16 ***
MONTANA         3644.11     171.82  21.208 < 2.2e-16 ***
NEBRASKA        4359.62     175.03  24.907 < 2.2e-16 ***
NEVADA          1969.22     172.23  11.434 < 2.2e-16 ***
NEW_HAMPSHIRE   1965.13     171.93  11.430 < 2.2e-16 ***
NEW_JERSEY     10672.72     262.80  40.612 < 2.2e-16 ***
NEW_MEXICO      3158.47     174.12  18.140 < 2.2e-16 ***
NEW_YORK       29833.23     529.86  56.304 < 2.2e-16 ***
NORTH_CAROLINA  7987.01     212.46  37.593 < 2.2e-16 ***
NORTH_DAKOTA    2746.73     171.58  16.008 < 2.2e-16 ***
OHIO           21086.53     317.17  66.484 < 2.2e-16 ***
OKLAHOMA        5029.61     188.38  26.700 < 2.2e-16 ***
OREGON          5566.66     180.65  30.815 < 2.2e-16 ***
PENNSYLVANIA   21292.22     325.81  65.351 < 2.2e-16 ***
RHODE_ISLAND    1783.01     172.00  10.366 < 2.2e-16 ***
SOUTH_CAROLINA  3806.90     179.89  21.162 < 2.2e-16 ***
SOUTH_DAKOTA    3024.36     171.36  17.650 < 2.2e-16 ***
TENNESSE        9005.20     195.23  46.125 < 2.2e-16 ***
TEXAS          24581.81     464.77  52.890 < 2.2e-16 ***
UTAH            3296.01     173.52  18.995 < 2.2e-16 ***
VERMONT         1792.38     171.08  10.477 < 2.2e-16 ***
VIRGINIA       11836.57     212.64  55.666 < 2.2e-16 ***
WASHINGTON      9110.27     198.90  45.804 < 2.2e-16 ***
WEST_VIRGINIA   6054.88     175.35  34.530 < 2.2e-16 ***
WISCONSIN       9116.06     203.39  44.820 < 2.2e-16 ***
WYOMING         2703.02     171.94  15.721 < 2.2e-16 ***

Estimating dummies for each state using lm() results in equivalent estimates. However, we must drop a state since all the state dummies sum to one. The state variable in the Produc dataset is a factor variable; it's ordered alphabetically. R drops the first factor by default, which is Alabama.

# The least squares dummy variable (LSDV) estimator
# Include a dummy variable for each unit (i.e., state)

mod_lsdv <- lm(hwy ~ gsp + as.factor(state), data = Produc)

# Extract the intercept

mod_lsdv$coefficients['(Intercept)']
(Intercept) 
    7390.71 

Note how the indicator for Alabama doesn't necessarily get dropped; it actually gets absorbed into the intercept. The omitted state effect could be any state in the panel.

In short, the intercepts usually aren't the focus in a fixed effects analysis; they're nuisance. In your case, the principal focus should be on X1 and X2.


In accordance with the suggestion provided by @Helix123, I wanted to note that there is, in fact, a within_intercept() function in the plm package. I always found it a bit artificial but it does return an "overall intercept" for within models and its accompanying standard error.

Under the hood, it's actually a weighted mean of the fixed effects. You could calculate this fairly easily in the one-way case using the following code:

# The overall intercept is just a weighted mean of the state effects

weighted.mean(fixef(mod_within), pdim(mod_within)$Tint$Ti)

Simply feed the weighted.mean() function the individual state effects, followed by a numeric vector of the weights, and you got your overall intercept.

Related Question