Initially, before the massive edits, your question was asking about the definition of bias. Quoting my other answer
Let $X_1,\dots,X_n$ be your sample of independent and identically
distributed random variables from distribution $F$. You are interested
in estimating unknown but fixed quantity $\theta$, using
estimator $g$ being a function of $X_1,\dots,X_n$. Since $g$ is a
function of random variables, estimate
$$ \hat\theta_n = g(X_1,\dots,X_n)$$
is also a random variable. We define bias as
$$ \mathrm{bias}(\hat\theta_n) = \mathbb{E}_\theta(\hat\theta_n) -
\theta $$
estimator is unbiased when $\mathbb{E}_\theta(\hat\theta_n) = \theta$.
This is the definition of bias in statistics (it is the one mentioned in bias-variance tradeoff). As you and others noted, people use the term "bias" for many different things, for example, we have sampling bias and bias nodes in neural networks (or described in here) in the area of machine learning, while outside statistics there are cognitive biases, you mentioned bias in electrical engineering etc. However if you are looking for some deeper philosophical connection between those concepts, then I'm afraid that you are looking too far.
Regarding "bias" shown on your examples
TLDR; Models you compare may not illustrate what you wanted to show and may be misleading. They illustrate the omitted-variable bias, rather then some kind of OLS bias in general.
Your first example is a handbook example of linear regression model
$$ y_i \sim \mathcal{N}(\alpha + \beta x_i, \;\sigma) $$
where $Y$ is a random variable and $X$ is fixed. In your second example you use
$$
x_i \sim \mathcal{N}(z_i, \;\sigma) \\
y_i \sim \mathcal{N}(z_i, \;\sigma)
$$
so both $X$ and $Y$ are both random variables that are conditionally independent given $Z$. You want to model relationship between $Y$ and $X$. You seem to expect to see slope equal to unity as if $Y$ depended on $X$ what is not true by design of your example. To convince yourself, take a closer look at your model. Below I simulate similar data as yours, with the difference that $Z$ is uniformly distributed since for me it seems more realistic then using deterministic variable (it also will make things easier later on), so the model becomes
$$
z_i \sim \mathcal{U}(0, 10) \\
x_i \sim \mathcal{N}(z_i, \;\sigma) \\
y_i \sim \mathcal{N}(z_i, \;\sigma)
$$
On the plot below you can see simulated data. On the first plot we see values of $X$ vs $Z$; on the second one $Y$ vs $Z$; on third $X$ vs $Y$ with fitted regression line; and on the final plot values of $X$ vs residuals from the described regression model (similar pattern to yours). Dependence of $X$ and $Y$ to $Z$ is obvious, the dependence of $X$ to $Y$ is illusory given the variable $Z$ that they both depend on. We call this an omitted-variable bias.
This will be even more clear if we look at the regression results:
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-3.7371 -0.9900 0.0036 0.9293 4.1523
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5842 0.1199 4.872 1.49e-06 ***
x 0.8827 0.0206 42.856 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.393 on 498 degrees of freedom
Multiple R-squared: 0.7867, Adjusted R-squared: 0.7863
F-statistic: 1837 on 1 and 498 DF, p-value: < 2.2e-16
and compare them to results of model that includes $Z$:
Call:
lm(formula = y ~ x + z)
Residuals:
Min 1Q Median 3Q Max
-2.5871 -0.7032 -0.0118 0.6028 3.1817
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.03394 0.09146 0.371 0.711
x -0.01049 0.04532 -0.232 0.817
z 1.00824 0.04825 20.895 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.018 on 497 degrees of freedom
Multiple R-squared: 0.8864, Adjusted R-squared: 0.886
F-statistic: 1940 on 2 and 497 DF, p-value: < 2.2e-16
In the first case we see strong and significant slope for $X$ and $R^2 = 0.79$ (nice!). Notice however what happens if we add $Z$ to our model: slope for $X$ diminishes almost to zero and becomes insignificant, while slope for $Z$ is large and significant, $R^2$ increases to $0.89$. This shows us that it was $Z$ that "caused" the relationship between $X$ and $Y$ since controlling it "takes out" all the $X$'s influence.
Moreover, notice that, intentionally or not, you have chosen such parameters for $Z$ that make it's influence harder to notice at first sight. If you used, for example, $\mathcal{U}(0,1)$, then the residual pattern would be much more striking.
Basically, similar things will happen no matter what $Z$ is, since the effect is caused by the fact that both $X$ and $Y$ depend on $Z$. Below you can see plots from similar model, where $Z$ is normally distributed $\mathcal{N}(0,1)$. The $R^2$ increase for this model is from $0.26$ to $0.52$ when controlling for $Z$.
In each case $Y$ depended on $Z$ and it's relationship with $X$ was illusory and caused by the fact that they both depend on $Z$. This is an important problem in statistics, but it is not caused by any pitfalls of OLS regression, or our inability to measure bias, but by using a misspecified model that does not consider some important variable.
Coca-cola adverts do not cause snow to fall and do not make people give each other presents, those things just happen together on Christmas. It would be wrong to model snowfall predicted by the screenings of Coca-cola adverts while ignoring the fact that they both happen on December.
Sidenote: I guess that what you might have been thinking of is a random design regression (or random regression; e.g. Hsu et al, 2011, An analysis of random design linear regression) but I do not think that the example you provided is relevant for discussing it.
Best Answer
You haven't given context, but you have linked to a post that offers one solution. I will assume that that solution is not applicable here.
Then another solution is to not use linear regression (simple or multiple) since they do not solve the problem you have.
First, though, let's use your of income as a function of age and education. Here, negative predicted values are reasonable because you are probably not interested in the income of newborn babies. However, there, taking log(income) is also reasonable, unless some people in your data set have no income.
But suppose that's not it. Then you can use a regression method that respects bounds on the dependent variable. One such is beta regression, which requires a DV that is between 0 and 1 - so you could scale your DV to be between 0 and 1 and then use beta regression.
But I would really urge you to add your actual variables to the question.