This post is an honest response to a common problem in the textbook presentation of regression, namely, the issue of what is random or fixed. Regression textbooks typically blithely state that the $X$ variables are fixed and go on their merry way, when in practice this assumption eliminates most of the interesting regression applications.
Rather than assume the $X$ variables are fixed, a better route to understanding regression analysis is to take a conditional distribution approach, one where the $X$'s are assumed random throughout, and then the case of fixed $X$ (which occurs only in very narrow experimental designs, and at that only when the experiment is performed without error) is subsumed as a special case where the distributions are degenerate.
What the OP is missing is the link from random $X$ to fixed realizations of $X$ ($X=x$), which all starts from the
Law of Total Expectation: Assume $U$ and $V$ are random, with finite expectation. Let $E(U | V=v) = \mu(v)$. Then $E(U) = E\{\mu(V)\}$.
This "Law" (which is actually a mathematical theorem) allows you to prove unbiasedness of the estimate $\hat \beta $ in two steps: (i) by first showing that it is unbiased, conditional on the $X$ data, and (ii) by using the Law of Total Expectation to then show that it is unbiased when averaged over all possible realizations of the $X$ data. (The average of 11,11, 11, 11, 11, 11, ... is 11, e.g.).
Answers to the OP:
Q1. Do we treat $(X_i,Y_i)$'s as random variables?
A1. Yes. They are random in the sense of the model, which describes the way that potentially observable values of such data might appear. Of course the actual observed data, $(x_i, y_i)$, are not random. Instead, they are fixed values, one many possible realizations of the potentially observable random variables $(X_i, Y_i)$. In rare cases, the $X$ data are fixed, but this is covered as a special case of randomness, so it is easier and safer just to assume randomness always.
Q2. Do we treat $\beta_0$ and $\beta_1$ as random variables?
A2. This is somewhat off topic from the OP, but still a very important question. From the scientist's conceptualization of reality, these are ordinarily fixed values. That is, the scientist assumes that there is a rigid structure responsible for the production of all of the $(Y_i | X_i = x_i)$ data values, and these $\beta_0, \beta_1$ values are part of that rigid structure.
Now, the parameters $\beta_0, \beta_1$ are uncertain in the scientist's mind (which is why he or she is collecting data in the first place!), so the scientist may choose to view them, mentally, as "random." The scientist has some ideas about the possible values of these parameters based on logic, subject matter considerations, and past data, and these ideas form the scientist's "prior distribution." The scientist then may update this prior using current data to obtain her/his posterior. That, in a nutshell, in what Bayesian statistics is all about.
But again, that issue is a little off topic from the OP, so let's consider everything conditional on the scientist's conceptualization that there is a rigid structure, and that these $\beta_0, \beta_1$ values are fixed in reality. In other words, all of my replies other than this one assume that the $\beta$'s are fixed.
Q3. Do we treat $\hat \beta_0$ and $\hat \beta_1$ as random variables?
A3. Here is another place where typical regression teaching sources are slippery. In some cases, they refer to the estimates $\hat \beta_0$ and $\hat \beta_1$ as functions of the (fixed) data that has been collected, and sometimes they refer to them as functions of the (random) potentially observable data, but use the same symbols $\hat \beta_0$ and $\hat \beta_1$ in either case. Often, you just have to understand from context which is which.
Whenever you see $E(\hat \beta)$, you can assume that $\hat \beta$ is a function of the random data, i.e., that $\hat \beta$ is a function of the $(X_i, Y_i)$.
Whenever you see the value of $\hat \beta$ reported, e.g., following a computer printout of results from a regression analysis, you can assume that $\hat \beta$ is a function of the fixed data sample, i.e., that $\hat \beta$ is a function of the $(x_i, y_i)$.
Q4. What can have an expected value and what can't (what gets treated as a constant when finding expected values) and why?
A4. Anything can have an expectation. Some things are more interesting than others, though. Anything that is a fixed (like a $\hat \beta$ that is a function of the observed $(x_i, y_i)$ sample) has an expectation that is just equal to that value. For example, if you observe from your computer printout that $\hat \beta_1 =0.23$, then $E(\hat \beta_1) =0.23$. But that is not interesting.
What is more interesting is the following question: over all possible potential realizations of $(X_i, Y_i)$ from this data-generating process, is the estimator $\hat \beta_1$ neither systematically too large, nor systematically too small, in an average sense, when compared to the structural parameter $\beta_1$? The expression
$E(\hat \beta_1) = \beta_1$ tells you that the answer to that question is a comforting "yes."
And in that expression $E(\hat \beta_1) = \beta_1$, it is implicit that $ \hat \beta_1$ is a function of the potentially observable $(X_i, Y_i)$ data, not the sample $(x_i, y_i)$ data.
Best Answer
In practice the difference is huge. The exogenous assumption that you refer to requires that the errors are not correlated with regressors. If they're correlated then you can't rely on the regressions with stochastic regressors.
For instance, in observational studies, such as pretty much all economics, you do not control the regressors. You can not set US GDP to a desired level, you can only observe it. Hence, in the model where GDP is a regressor, you want errors to be independent of GDP, because in this model you can only assume stochastic regressors.
When your errors are correlated with regressors you get endogeneity issue. There are ways to handle it, such as using lagged regressors or instrumental variables.
In econometrics a textbook example is the impact of the exogenous price on the demand. We're talking about typical demand-supply equations. Here, the problem is that the prices also depend on the supply. Hence, there is an endogeneity issue, which any econometrician will promptly point out. This is to answer your question on feasibility of testing the assumption.
Once you figured that endogeneity is here, you may look for a so called instrumental variable. These are regressors which are correlated with the price but not with demand, i.e. something that may impact the supply, for instance. If the demand is for oranges, then maybe a temperature in Florida in Spring would be a suitable instrument, because it's going to impact supply of oranges - and price - but not the demand. So, you plug this instrument into the regression and tease out the impact of the price on demand