Solving the Cost Function using the Derivative

linear algebralinear regressionmachine learning

Currently I am learning the Linear Regression, in particular, the cost function. Here is the problem I am working on right now:

Suppose we have a training set with $m=3$ examples-points $(1,1), (2,2)$ and $(3,3)$.The hypothesis function is $h_\theta(x)=\theta_1x$ with a parameter $\theta_1$. The cost function is $J(\theta_1)=\frac{1}{2m}\sum_{i=0}^m(h_\theta(x^i)-y^i)^2$ . We need to find $J(0)$ , which is a relatively easy task if done manually(and I have already done it).

I am interested in doing it through a derivative.
If I do it this way, I get $J'(\theta_1)=\frac{1}{2m}\sum_{i=1}^m(2(h_{\theta_1}x^i-y^i))h_\theta$ (If hopefully I haven't done any mistakes) Then, to find a minimum value(values) of $\theta_1$ all I need to do is to solve $J'(\theta_1)=0$. That's where I have a few questions.

Can we assume that the sum will never be zero? If so, when I solve this equation I find that the only way for the equation to be zero is for $h_{\theta_1}$ to be zero, which doesn't seem right, or for $2(h_{\theta_1}x^i-y^i)$ to be zero for any pair of $(x,y)$. That is, $\theta_1=1$ . Is my reasoning correct?

Best Answer

You know that $h_\theta(x)=\theta_1x$. Thus the cost function is

$$J(\theta_1)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^i)-y^i)^2=\frac{1}{2m}\sum_{i=1}^m(\theta_1x^i -y^i)^2$$

Setting the first derivative equal to $0$. For the derivative we use the chain rule.

$$J^{'}(\theta_1)=\frac{1}{m}\sum_{i=1}^m(\theta_1x^i -y^i)\cdot x^i=0$$

I omit the factor $\frac1m$. Each summand gets it´s own sigma sign.

$$\sum_{i=1}^m\theta_1(x^i)^2 -\sum_{i=1}^my^i\cdot x^i=0$$

$\theta_1$ can be factored out since it does not depend on index $i$

$$\theta_1\cdot \sum_{i=1}^m(x^i)^2 -\sum_{i=1}^my^i\cdot x^i=0$$

$$\theta_1\cdot \sum_{i=1}^m(x^i)^2 =\sum_{i=1}^my^i\cdot x^i$$

$$\hat \theta_1=\frac{\sum\limits_{i=1}^my^i\cdot x^i}{\sum\limits_{i=1}^m(x^i)^2}$$

We can insert your values.

$$\hat \theta_1=\frac{ 1\cdot 1+2\cdot 2+3\cdot 3}{ 1^2+2^2+3^2}=1$$

In your case the regression line is $h_0(x)=1\cdot x$

Are the steps compehensible and do they answer your questions? If not feel free to ask.

Best Answer

Related Solutions

Machine Learning – Why Divide by 2m in Regression

Cost Function Derivation Using MLE – Why Use Log Function?

Related Question