Solved – Estimation of exponential model

estimationnonlinear regression

An exponential model is a model described by following equation:
$$\hat{y_{i}}=\beta_{0}\cdot e^{\beta_{1}x_{1i}+\ldots+\beta_{k}x_{ki}}$$

The most common approach used to estimate such model is linearization, which can be done easily by calculating logarithms of both sides. What are the other approaches? I'm especially interested in those which can handle $y_{i}=0$ in some observations.

Update 31.01.2011
I'm aware of the fact that this model can't produce zero. I'll elaborate a bit what I'm modeling and why I choose this model. Let's say we want to predict how much money does a client spend in a shop. Of course many clients are just looking and they don't buy anything, that why there are 0. I didn't want to use linear model because it produces a lot of negative values, which doesn't make any sense. The other reason is that this model works really good, much better than the linear. I've used genetic algorithm to estimate those parameters so it wasn't 'scientific' approach. Now I'd like to know how to deal with problem using more scientific methods. It can be also assumed that most, or even all, of the variables are binary variables.

Best Answer

There are several issues here.

(1) The model needs to be explicitly probabilistic. In almost all cases there will be no set of parameters for which the lhs matches the rhs for all your data: there will be residuals. You need to make assumptions about those residuals. Do you expect them to be zero on the average? To be symmetrically distributed? To be approximately normally distributed?

Here are two models that agree with the one specified but allow drastically different residual behavior (and therefore will typically result in different parameter estimates). You can vary these models by varying assumptions about the joint distribution of the $\epsilon_{i}$:

$$\text{A:}\ y_{i} =\beta_{0} \exp{\left(\beta_{1}x_{1i}+\ldots+\beta_{k}x_{ki} + \epsilon_{i}\right)}$$ $$\text{B:}\ y_{i} =\beta_{0} \exp{\left(\beta_{1}x_{1i}+\ldots+\beta_{k}x_{ki}\right)} + \epsilon_{i}.$$

(Note that these are models for the data $y_i$; there usually is no such thing as an estimated data value $\hat{y_i}$.)

(2) The need to handle zero values for the y's implies the stated model (A) is both wrong and inadequate, because it cannot produce a zero value no matter what the random error equals. The second model above (B) allows for zero (or even negative) values of y's. However, one should not choose a model solely on such a basis. To reiterate #1: it is important to model the errors reasonably well.

(3) Linearization changes the model. Typically, it results in models like (A) but not like (B). It is used by people who have analyzed their data enough to know this change will not appreciably affect the parameter estimates and by people who are ignorant of what is happening. (It is hard, many times, to tell the difference.)

(4) A common way to handle the possibility of a zero value is to propose that $y$ (or some re-expression thereof, such as the square root) has a strictly positive chance of equally zero. Mathematically, we are mixing a point mass (a "delta function") in with some other distribution. These models look like this:

$$\eqalign{ f(y_i) &\sim F(\mathbf{\theta}); \cr \theta_j &= \beta_{j0} + \beta_{j1} x_{1i} + \cdots + \beta_{jk} x_{ki} }$$

where $\Pr_{F_\theta}[f(Y) = 0] = \theta_{j+1} \gt 0$ is one of the parameters implicit in the vector $\mathbf{\theta}$, $F$ is some family of distributions parameterized by $\theta_1, \ldots, \theta_j$, and $f$ is the reexpression of the $y$'s (the "link" function of a generalized linear model: see onestop's reply). (Of course, then, $\Pr_{F_\theta}[f(Y) \le t]$ = $(1 - \theta_{j+1})F_\theta(t)$ when $t \ne 0$.) Examples are the zero-inflated Poisson and Negative Binomial models.

(5) The issues of constructing a model and fitting it are related but different. As a simple example, even an ordinary regression model $Y = \beta_0 + \beta_1 X + \epsilon$ can be fit in many ways by means of least squares (which gives the same parameter estimates as Maximum Likelihood and almost the same standard errors), iteratively reweighted least squares, various other forms of "robust least squares," etc. The choice of fitting is often based on convenience, expedience (e.g., availability of software), familiarity, habit, or convention, but at least some thought should be given to what is appropriate for the assumed distribution of the error terms $\epsilon_i$, to what the loss function for the problem might reasonably be, and to the possibility of exploiting additional information (such as a prior distribution for the parameters).

Related Question