Solved – How to test if the two parameter estimates in the same model are significantly different

nonlinear regressionstatistical significance

I have the model

$$
y=x^a \times z^b + e
$$

where $y$ is the dependent variable, $x$ and $z$ are explanatory variables, $a$ and $b$ are the parameters and $e$ is an error term. I have parameter estimates of $a$ and $b$ and a covariance matrix of these estimates. How do I test if $a$ and $b$ are significantly different?

Best Answer

Assessing the hypothesis that $a$ and $b$ are different is equivalent to testing the null hypothesis $a - b = 0$ (against the alternative that $a-b\ne 0$).

The following analysis presumes it is reasonable for you to estimate $a-b$ as $$U = \hat a - \hat b.$$ It also accepts your model formulation (which often is a reasonable one), which--because the errors are additive (and could even produce negative observed values of $y$)--does not permit us to linearize it by taking logarithms of both sides.

The variance of $U$ can be expressed in terms of the covariance matrix $(c_{ij})$ of $(\hat a, \hat b)$ as

$$\operatorname{Var}(U) = \operatorname{Var}(\hat a - \hat b) = \operatorname{Var}(\hat a) + \operatorname{Var}(\hat b) - 2 \operatorname{Cov}(\hat a, \hat b) = c_{11} + c_{22} - 2c_{12}^2.$$

When $(\hat a, \hat b)$ is estimated with least squares, one usually uses a "t test;" that is, the distribution of $$t = U / \sqrt{\operatorname{Var(U)}}$$ is approximated by a Student t distribution with $n-2$ degrees of freedom (where $n$ is the data count and $2$ counts the number of coefficients). Regardless, $t$ usually is the basis of any test. You may perform a Z test (when $n$ is large or when fitting with Maximum Likelihood) or bootstrap it, for instance.

To be specific, the p-value of the t test is given by

$$p = 2t_{n-2}(-|t|)$$

where $t_{n-2}$ is the Student t (cumulative) distribution function. It is one expression for the "tail area:" the chance that a Student t variable (of $n-2$ degrees of freedom) equals or exceeds the size of the test statistic, $|t|.$

More generally, for numbers $c_1,$ $c_2,$ and $\mu$ you can use exactly the same approach to test any hypothesis

$$H_0: c_1 a + c_2 b = \mu$$

against the two-sided alternative. (This encompasses the special but widespread case of a "contrast".) Use the estimated variance-covariance matrix $(c_{ij})$ to estimate the variance of $U = c_1 a + c_2 b$ and form the statistic

$$t = (c_1 \hat a + c_2 \hat b - \mu) / \sqrt{\operatorname{Var}(U)}.$$

The foregoing is the case $(c_1,c_2) = (1,-1)$ and $\mu=0.$

To check that this advice is correct, I ran the following R code to create data according to this model (with Normally distributed errors e), fit them, and compute the values of $t$ many times. The check is that the probability plot of $t$ (based on the assumed Student t distribution) closely follows the diagonal. Here is that plot in a simulation of size $500$ where $n=5$ (a very small dataset, chosen because the $t$ distribution is far from Normal) and $a=b=-1/2.$

In this example, at least, the procedure works beautifully. Consider re-running the simulation using parameters $a,$ $b,$ $\sigma$ (the error standard deviation), and $n$ that reflect your situation.

Here is the code.

#
# Specify the true parameters.
#
set.seed(17)
a <- -1/2
b <- -1/2
sigma <- 0.25 # Variance of the errors
n <- 5        # Sample size
n.sim <- 500  # Simulation size
#
# Specify the hypothesis.
#
H.0 <- c(1, -1) # Coefficients of `a` and `b`.
mu <- 0 
#
# Provide x and z values in terms of their logarithms.
#
log.x <- log(rexp(n))
log.z <- log(rexp(n))
#
# Compute y without error.
#
y.0 <- exp(a * log.x + b * log.z)
#
# Conduct a simulation to estimate the sampling distribution of the t statistic.
#
sim <- replicate(n.sim, {
  #
  # Add the errors.
  #
  e <- rnorm(n, 0, sigma)
  df <- data.frame(log.x=log.x, log.z=log.z, y.0, y=y.0 + e)
  #
  # Guess the solution.
  #
  fit.ols <- lm(log(y) ~ log.x + log.z - 1, subset(df, y > 0))
  start <- coefficients(fit.ols) # Initial values of (a.hat, b.hat)
  #
  # Polish it using nonlinear least squares.
  #
  fit <- nls(y ~ exp(a * log.x + b * log.z), df, list(a=start[1], b=start[2]))
  #
  # Test a hypothesis.
  #
  cc <- vcov(fit)
  s <- sqrt((H.0 %*% cc %*% H.0))
  (crossprod(H.0, coef(fit)) - mu) / s
})
#
# Display the simulation results.
#
summary(lm(sort(sim) ~ 0 + ppoints(length(sim))))
qqplot(qt(ppoints(length(sim)), df=n-2), sim, 
       pch=21, bg="#00000010", col="#00000040",
       xlab="Student t reference value", 
       ylab="Test statistic")
abline(0:1, col="Red", lwd=2)

Related Solutions

Solved – How to test if two regression models are significantly different in R

Yes you can only use ANOVA for nested models. That means you can compare models of the form Energy ~ HDD + House and Energy ~ HDD, but not across the same model used with different samples. Instead, take a look at the response to this question: Test equivalence of non-nested models.

Taken from that thread, you can use some different functions:

library(lmtest)
coxtest(fit1, fit2)
jtest(fit1, fit2)

Coxtest is the Cox LR test, and jtest is the Davidson-MacKinnon J test.

Solved – Same estimates but different p-values in tukey post hoc test (lmer)

I agree that the models have the same fitted values, but your two glht calls are testing different hypotheses. The first is a single contrast comparing the two poverty means, averaging over children; whereas the second is the set of all pairwise comparisons among the four poverty*children combinations.

The lsmeans package is helpful in providing some insight, because it makes it relatively easy to construct the needed contrasts. In particular, try this:

library(lsmeans)
library(multcomp)
summary(glht(TC1, linfct = lsm(pairwise ~ poverty*children)))

This will generate all six pairwise comparisons of the four poverty*children means, and should give the same results as you show for the interak comparisons from TC1_2. The $P$ values may vary slightly because they are computed by a simulation method with the multivariate $t$ distribution.

Alternatively, the lsmeans package can provide the same results without needing multcomp or glht:

pairs(lsmeans(TC1, ~ poverty * children), adjust = "mvt")
pairs(lsmeans(TC1_2, ~ interak), adjust = "mvt")

Best Answer

Related Solutions

Solved – How to test if two regression models are significantly different in R

Solved – Same estimates but different p-values in tukey post hoc test (lmer)

Related Question