You're asking about doing linear regression with the $L_{\infty}$ norm or, equivalently, the Chebyshev approximation criterion, rather than the usual $L_2$ norm that minimizes the sum of squared residuals.
There isn't a nice formula that will give you $\alpha$ and $\beta$. Instead, the standard approach is to obtain $\alpha$ and $\beta$ as the solution to a linear programming problem. The formulation is
$$\text{Minimize } r$$
subject to
$$r - (y_i - \alpha - \beta x_i ) \geq 0, \text{ for each } i,$$
$$r + (y_i - \alpha - \beta x_i ) \geq 0, \text{ for each } i.$$
The variables are $r$ (the maximum residual), $\alpha$, and $\beta$, and the $(x_i, y_i)$ are the data values that become parameters in the LP formulation.
Here's an example, although the author assumes a model with $\alpha = 0$.
Continuing with your simplifying assumptions, let's assume for simplicity that $\bar x=0$ and $\bar y=0$, so the standard solution is
$$
\frac{\sum_{i=1}^N x_iy_i}{\sum_{i=1}^Nx_i^2}\;.
$$
We can write this as
$$
\frac{\sum_{i=1}^N x_i^2\frac{y_i}{x_i}}{\sum_{i=1}^Nx_i^2}\;.
$$
So it's actually a weighted average of the ratios $\frac{y_i}{x_i}$, with weights $x_i^2$, not as different from your proposed solution as you perhaps thought it was.
The question remains why the weights $x_i^2$ in the standard solution are better than the equal weights that you propose to use. This is because under the standard assumption that the $y_i$ all have the same additive error, the errors of values near the origin get amplified when you take the ratio $\frac{y_i}{x_i}$ with small values of $x_i$. It's intuitively clear than when you shift a data point near the origin by a certain vertical error, that changes the ratio more than if you do it with a data point further away; so the ratios for small $x_i$ are more uncertain and should carry less weight.
In fact, this can be stated more quantitatively. If you perform a linear regression with different error bars for the different data points, you find that each data point should be weighted with the inverse of its variance, that is, the inverse square of its standard deviation. Forming the ratio $\frac{y_i}{x_i}$ amplifies the error in $y_i$ by a factor $\frac1{x_i}$, so if we assume that the errors in the $y_i$ are all the same, the errors in the ratios are proportional to $\frac1{x_i}$, so the weights should be proportional to the inverse squares of those errors, that is, to $x_i^2$. So the standard formula is in fact just your formula, properly weighted.
Best Answer
Part of the issue is whether you want your function to fit the data "as closely as possible", or if you want it to hit every data point exactly.
For example, if you want to fit some data that appears linear, using linear least squares approximation to find the two coefficients which minimize the error is the right way to go. However, if you want an exact estimate, you might want to look at Lagrange Interpolation.
It sounds like you want a "close as possible fit", but you want to compare the accuracy of Polynomials of different degrees. You can use least squares techniques to find the coefficients of a polynomial of a given degree. To do this, you will use a matrix containing powers of your data points and a vector containing your coefficients.
Say we have d data points, and we want a degree n polynomial. Then our matrix will have d rows and n+1 columns. The ith row contains the powers, 0 through n, of the ith data point. The vector contains the constant, then the linear coefficient, and so on.
Multiplying the matrix and the vector gives you a vector of dimension d. (Independent of the degree of the polynomial used!) Typically we use these objects to minimize the error, but once you have the best coefficients for a given degree, you can multiply the matrix by the coefficient vector, and finally subtract the vector containing the y-values. The norm of this vector (X Powers)*(Coefs) - (Y data) is the square root of the sum of the squares of the error at each data point.
If you find this norm for several different degrees, you can find the degree polynomial with the lowest error, and that should be the closest approximation for the degrees tested.
Best of luck!