The $\frac{1}{m}$ is to "average" the squared error over the number of components so that the number of components doesn't affect the function (see John's answer).
So now the question is why there is an extra $\frac{1}{2}$. In short, it doesn't matter. The solution that minimizes $J$ as you have written it will also minimize $2J=\frac{1}{m} \sum_i (h(x_i)-y_i)^2$. The latter function, $2J$, may seem more "natural," but the factor of $2$ does not matter when optimizing.
The only reason some authors like to include it is because when you take the derivative with respect to $x$, the $2$ goes away.
Lets try to derive why the logarithm comes in the cost function of logistic regression from first principles.
So we have a dataset X consisting of m datapoints and n features. And there is a class variable y a vector of length m which can have two values 1 or 0.
Now logistic regression says that the probability that class variable value $y_i =1$ , $i=1,2,...m$ can be modelled as follows
$$
P( y_i =1 | \mathbf{x}_i ; \theta) = h_{\theta}(\mathbf{x}_i) = \dfrac{1}{1+e^{(- \theta^T \mathbf{x}_i)}}
$$
so $y_i = 1$ with probability $h_{\theta}(\mathbf{x}_i)$ and $y_i=0$ with probability $1-h_{\theta}(\mathbf{x}_i)$.
This can be combined into a single equation as follows, ( actually $y_i$ follows a Bernoulli distribution)
$$ P(y_i ) = h_{\theta}(\mathbf{x}_i)^{y_i} (1 - h_{\theta}(\mathbf{x}_i))^{1-y_i}$$
$P(y_i)$ is known as the likelihood of single data point $\mathbf{x}_i$, i.e. given the value of $y_i$ what is the probability of $\mathbf{x}_i$ occurring. it is the conditional probability $P(\mathbf{x}_i | y_i)$.
The likelihood of the entire dataset $\mathbf{X}$ is the product of the individual data point likelihoods. Thus
$$ P(\mathbf{X}|\mathbf{y}) = \prod_{i=1}^{m} P(\mathbf{x}_i | y_i) = \prod_{i=1}^{m} h_{\theta}(\mathbf{x}_i)^{y_i} (1 - h_{\theta}(\mathbf{x}_i))^{1-y_i}$$
Now the principle of maximum likelihood says that we find the parameters that maximise likelihood $P(\mathbf{X}|\mathbf{y})$.
As mentioned in the comment, logarithms are used because they convert products into sums and do not alter the maximization search, as they are monotone increasing functions. Here too we have a product form in the likelihood.So we take the natural logarithm as maximising the likelihood is same as maximising the log likelihood, so log likelihood $L(\theta)$ is now:
$$ L(\theta) = \log(P(\mathbf{X}|\mathbf{y}) = \sum_{i=1}^{m} y_i \log(h_{\theta}(\mathbf{x}_i)) + (1-y_i) \log(1 - h_{\theta}(\mathbf{x}_i)) $$.
Since in linear regression we found the $\theta$ that minimizes our cost function , here too for the sake of consistency, we would like to have a minimization problem. And we want the average cost over all the data points. Currently, we have a maximimzation of $L(\theta)$. Maximization of $L(
\theta)$ is equivalent to minimization of $ -L(\theta)$. And using the average cost over all data points, our cost function for logistic regresion comes out to be,
$$ J(\theta) = - \dfrac{1}{m} L(\theta)$$
$$ = - \dfrac{1}{m} \left( \sum_{i=1}^{m} y_i \log (h_{\theta}(\mathbf{x}_i)) + (1-y_i) \log (1 - h_{\theta}(\mathbf{x}_i)) \right )$$
Now we can also understand why the cost for single data point comes as follows:
the cost for a single data point is $ = -\log( P(\mathbf{x}_i | y_i))$, which can be written as $ - \left ( y_i \log (h_{\theta}(\mathbf{x}_i)) + (1 - y_i) \log (1 - h_{\theta}(\mathbf{x}_i) \right )$.
We can now split the above into two depending upon the value of $y_i$. Thus we get
$J(h_{\theta}(\mathbf{x}_i), y_i) = - \log (h_{\theta}(\mathbf{x}_i)) , \text{ if } y_i=1$, and
$J(h_{\theta}(\mathbf{x}_i), y_i) = - \log (1 - (h_{\theta}(\mathbf{x}_i) ) , \text{ if } y_i=0 $.
Best Answer
You know that $h_\theta(x)=\theta_1x$. Thus the cost function is
$$J(\theta_1)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^i)-y^i)^2=\frac{1}{2m}\sum_{i=1}^m(\theta_1x^i -y^i)^2$$
Setting the first derivative equal to $0$. For the derivative we use the chain rule.
$$J^{'}(\theta_1)=\frac{1}{m}\sum_{i=1}^m(\theta_1x^i -y^i)\cdot x^i=0$$
I omit the factor $\frac1m$. Each summand gets it´s own sigma sign.
$$\sum_{i=1}^m\theta_1(x^i)^2 -\sum_{i=1}^my^i\cdot x^i=0$$
$\theta_1$ can be factored out since it does not depend on index $i$
$$\theta_1\cdot \sum_{i=1}^m(x^i)^2 -\sum_{i=1}^my^i\cdot x^i=0$$
$$\theta_1\cdot \sum_{i=1}^m(x^i)^2 =\sum_{i=1}^my^i\cdot x^i$$
$$\hat \theta_1=\frac{\sum\limits_{i=1}^my^i\cdot x^i}{\sum\limits_{i=1}^m(x^i)^2}$$
We can insert your values.
$$\hat \theta_1=\frac{ 1\cdot 1+2\cdot 2+3\cdot 3}{ 1^2+2^2+3^2}=1$$
In your case the regression line is $h_0(x)=1\cdot x$
Are the steps compehensible and do they answer your questions? If not feel free to ask.