I am new started machine learning program. I find it difficult to understand gradient descent algorithm. I am going through machine learning from coursera by Andrew Ng.All of his lecturer in second week course is around gradient descent algorithm. Since I don't understand this formula , I am unable to move further on this course. Kindly help.

Linear regression to minimize the Cost Function:

$$J(\theta_0, \theta_1) = \frac{1}{2m} \sum_{i=1}^m \left(h_\theta(x_i) – y_i \right)^2 $$

Hypothesis of linear model is

$$h_\theta(x) = \theta \cdot x = \theta_0 + \theta_1 x_1$$

How to read this formula? What are $h_\theta$, $x_i$ and $y_i$?

How to make a decision which data has to go with this parameter?

Why 1/2m?

## Best Answer

I've already asked this question, it's here. Let me repost the formula in my original question:

$$ \frac{1}{m} \sum _{i=1}^m \left(h_\theta(X^{(i)})-Y^{(i)}\right)^2 $$

Here we are trying to minimise the cost of errors (i.e.: residuals) between our model and our data points. It's a cost function because the errors are "costs", the less errors your model give, the better your model is.

h(θ) is the the prediction from your regression model. y(i) is the dependent variable and x(i) is your independent variable. Here, we are adding up the error for each data point, it should be squared because the difference can be negative.

We've added the 1/m term to minimise the average error. Please check my original question for a nice explanation. The 1/2 term is there simply to cancel out the 2 in the square term when doing the first derivative. We don't need those terms, but the mathematics will be cleaner if we have the terms.

You can think like this: "

I'm given a bunch of y(i) data points, how to draw a straight line so that it fits as close to my y(i) data points as possible? My approach would be to calculate the errors for each data point, and try to adjust my line to minimise those errors".