Difference between 1st and 2nd order algorithm:
Any algorithm that requires at least one first-derivative/gradient is a first order algorithm. In the case of a finite sum optimization problem, you may use only the gradient of a single sample, but this is still first order because you need at least one gradient.
A second order algorithm is any algorithm that uses any second derivative, in the scalar case. To elaborate, Newton's step requires use of the Hessian which has second order derivatives i.e. $H_{i,j} = \partial_i \partial_j f(x)$.
In general, an algorithm can be classified as $n^{th}$ order optimization if it uses a tensor of rank $n$.
Important Distinguishment:
When an algorithm uses an approximated version of the second order information (Hessian) for optimization, it may or may not be a second order algorithm. I am emphasizing this because, for example, you may approximate the Hessian by the outer product of the gradient $H = \nabla f(x) \nabla f(x)^T$. In this case you are only using first order information, so this would be considered a first order algorithm. On the other hand, if you are approximating the Hessian by only computing the diagonal elements $\partial_i \partial_i f(x)$, this would be a second order algorithm.
The order of an algorithm can be thought to be the order of error resulting from approximation of the derivative. Recall the definition of a derivative:
$$\partial_i f(x) \approx \frac{f(x + \epsilon \Delta x_i) - f(x)}{\epsilon}$$
and the second derivative:
$$\partial_{ij} f(x) \approx \frac{\partial_j f(x + \epsilon \Delta x_i) - \partial_j f(x)}{\epsilon},$$
where $x_i$ is equal to $x$ only in the $i^{th}$ coordinate and $0$ elsewhere. Hence if you use an $\epsilon$ approximation for each derivative computation, you will get an $\epsilon^2$ error inherently in your optimization algorithm.
Because it is sort of included in your question, I would like to add that first order algorithms do not necessarily converge linearly. Stochastic Gradient Descent (SGD) converges sub-linearly and many second order algorithms do not converge quadratically.
Best Answer
The Hessian is used both for seeking an extremum (by Newton-Raphson) and to test if an extremum is a min/max (if the Hessian is pos/neg definite). I assume you want to look at the first, say in ${\Bbb R}^n$:
Suppose $a=(a_1,\ldots,a_n)$ is a 'guess' for a point where there is an extremum of your function $f(x)$. If it is not exact then $\nabla f(a)$ (column vector) is not exactly zero, so we want to improve.
Now, assuming $f$ smooth enough, if we look at the derivative (gradient) at $x=a+h$ then by Taylor expansion: $$ \frac{\partial f (a+h)}{\partial x_i} = \frac{\partial f (a)}{\partial x_i} + \sum_{j=1}^n \frac{\partial^2 f (a)}{\partial x_i \partial x_j} h_j + \ldots $$ or in matrix form: $$ \nabla f(a+h) = \nabla f(a) + H_f(a) h + \ldots$$ As we want the LHS to vanish we find $h$ so that the RHS vanishes. This is just a standard matrix equation, so: $$ h = - \left( H_f(a)\right)^{-1} \nabla f(a)$$ Our new guess is then $a+h$ and we continue from there...