Since your temperature readings aren't equispaced, you can't directly apply Simpson's rule; the approach equivalent to this is to construct the parabola that interpolates three consecutive points (i.e., across two panels), and integrate that. The problem with this approach, of course, is that you need to have an odd number of data points (even number of panels) to do this.
You can use the trapezoidal rule, of course, across each panel, but a probably better idea might be to construct a cubic Hermite interpolant for each panel, and then integrate that. An obvious problem is that four conditions are needed to uniquely determine a cubic for each panel (two points and two derivative values), but estimates of derivative values can be constructed from the data such that the piecewise interpolant is locally monotonic; briefly, an piecewise interpolant is locally monotonic if there are no spurious inflection points or extrema within a panel. One approach to estimating derivative values for monotonic interpolation, due to Fritsch and Carlson, is implemented in FORTRAN as the pchip
package, and in MATLAB as the function pchip
. A more modern approach, and one which may be better is some situations, is due to Steffen. You may have to experiment which of the trapezoidal rule, Fritsch-Carlson, or Steffen would be best for integrating your data.
Let me detail the way one uses cubic Hermite interpolation for integrating data:
Each panel is bounded by two points, $(x_i,y_i)$ and $(x_{i+1},y_{i+1})$. The trapezoidal rule consists of constructing the line joining these two points (linear interpolation):
$$f_i(x)=\frac{x_{i+1}-x}{x_{i+1}-x_i}y_i+\frac{x-x_i}{x_{i+1}-x_i}y_{i+1}$$
and integrating that:
$$\int_{x_i}^{x_{i+1}}f_i(x)\mathrm dx=\frac{x_{i+1}-x_i}{2}(y_i+y_{i+1})$$
Integrating with cubic Hermite interpolation can be considered as a further "improvement" of the trapezoidal rule; briefly, in addition to points, one has derivative values $y_i^{\prime}$ and $y_{i+1}^{\prime}$, from which one constructs the cubic
$$g_i(x)=y_i+y_i^{\prime}(x-x_i)+c_i(x-x_i)^2+d_i(x-x_i)^3$$
where
$\displaystyle c_i=\frac{3\frac{y_{i+1}-y_i}{x_{i+1}-x_i}-2y_i^{\prime}-y_{i+1}^{\prime}}{x_{i+1}-x_i}$ and $\displaystyle d_i=\frac{y_i^{\prime}+y_{i+1}^{\prime}-2\frac{y_{i+1}-y_i}{x_{i+1}-x_i}}{(x_{i+1}-x_i)^2}$.
Integrating $g_i(x)$ might look slightly complicated, however, there is a nice expression for the integral:
$$\int_{x_i}^{x_{i+1}}g_i(x)\mathrm dx=\frac{x_{i+1}-x_i}{6}\left(y_i+4g_i\left(\frac{x_{i+1}+x_i}{2}\right)+y_{i+1}\right)$$
whose verification I'll leave up to you.
I'll just add the note that in the case of equispaced data, integration with a piecewise cubic Hermite interpolant is equivalent to integration with the trapezoidal rule plus corrections at the beginning and end.
I wouldn't agree that direct techniques always use a "formula". For instance, in solving linear systems of equations "directly", one typically doesn't use a closed-form solution, but instead something like LU or Cholesky decomposition.
Iterative solvers, on the other hand, would be things like Gauss-Seidel iteration, SOR (successive over-relaxation), Krylov subspace methods like conjugate gradients, or multilevel methods.
So what's the big difference? In the case of linear systems, I'd say the following are two of the main differences:
With a direct method, you have to do a certain amount of work, and then you obtain your solution. For instance, if you're doing LU factorization, you can't stop halfway through and say "alright, that's good enough". (Although there is something called incomplete LU, but it's actually used in iterative methods!) To get a solution, you need to run the algorithm to the end. Typically, the solution you get will then be close to the exact one.
With iterative methods, you always update your old guess and get (hopefully) a bit closer to the true solution. This means that you can decide how much work you want to invest depending on how accurate you need your solution. For instance, CG is known to converge to the exact solution in $n$ steps for a matrix of size $n$, and was historically first seen as a direct method because of this. However, after a while people figured out that it works really well if you just stop the iteration much earlier - often you will get a very good approximation after much fewer than $n$ steps. In fact, we can analyze how fast CG converges. The end result is that CG is used as an iterative method for large linear systems today.
The second big difference is that, for a direct method, you generally need to have the entire matrix stored in memory. Not so in iterative methods: here, you typically only need a way to compute the application of the matrix to a vector, i.e., the matrix-vector product. So if you have a very large matrix, but you can relatively quickly compute its application to a vector (for instance, because the matrix is very sparse or has some other kind of structure), you can use this to your advantage.
Best Answer
One way to improve a numerical integration is to refine the mesh of sample points. Using a number of sample points that is a power of $2$ makes this easier. The idea is to compute the integral using $n$ points and then with $2n$ points (reusing the work done for the $n$ original points). If the two values are close, then you're done. Otherwise, keep on refining.
A variant of this idea is an adaptive method. See for instance the Adaptive Simpson's method.