The case of "attenuation bias" can be more clearly presented if we examine the "probit" model -but the result carry over to the logistic regression also.
Underneath the Conditional Probability Models (Logistic (logit), "probit", and "Linear Probability" models) we can postulate a latent (unobservable) linear regression model:
$$y^* = X\beta + u$$
where $y^*$ is a continuous unobservable variable (and $X$ is the regressor matrix). The error term is assumed to be independent from the regressors, and to follow a distribution that has a density symmetric around zero, and in our case, the standard normal distribution $F_U(u)= \Phi(u)$.
We assume that what we observe, i.e. the binary variable $y$, is an Indicator function of the unobservable $y^*$:
$$ y = 1 \;\;\text{if} \;\;y^*>0,\qquad y = 0 \;\;\text{if}\;\; y^*\le 0$$
Then we ask "what is the probability that $y$ will take the value $1$ given the regressors?" (i.e. we are looking at a conditional probability). This is
$$P(y =1\mid X ) = P(y^*>0\mid X) = P(X\beta + u>0\mid X) = P(u> - X\beta\mid X) \\= 1- \Phi (-Χ\beta) = \Phi (X\beta) $$
the last equality due to the "reflective" property of the standard cumulative distribution function, which comes from the symmetry of the density function around zero. Note that although we have assumed that $u$ is independent of $X$, conditioning on $X$ is needed in order to treat the quantity $X\beta$ as non-random.
If we assume that $X\beta = b_0+b_1X_1 + b_2X_2$, then we obtain the theoretical model
$$P(y =1\mid X ) = \Phi (b_0+b_1X_1 + b_2X_2) \tag{1}$$
Let now $X_2$ be independent of $X_1$ and erroneously excluded from the specification of the underlying regression. So we specify
$$y^* = b_0+b_1X_1 + \epsilon$$
Assume further that $X_2$ is also a normal random variable $X_2 \sim N(\mu_2,\sigma_2^2)$. But this means that
$$\epsilon = u + b_2X_2 \sim N(b_2\mu_2, 1+b_2^2\sigma_2^2)$$
due to the closure-under-addition of the normal distribution (and the independence assumption). Applying the same logic as before, here we have
$$P(y =1\mid X_1 ) = P(y^*>0\mid X_1) = P(b_0+b_1X_1 + \epsilon>0\mid X_1) = P(\epsilon> - b_0-b_1X_1\mid X_1) $$
Standardizing the $\epsilon$ variable we have
$$P(y =1\mid X_1 )= 1- P\left(\frac{\epsilon-b_2\mu_2}{\sqrt {1+b_2^2\sigma_2^2}}\leq - \frac {(b_0 + b_2\mu_2)}{\sqrt {1+b_2^2\sigma_2^2}}- \frac {b_1}{\sqrt {1+b_2^2\sigma_2^2}}X_1\mid X_1\right)$$
$$\Rightarrow P(y =1\mid X_1) = \Phi\left(\frac {(b_0 + b_2\mu_2)}{\sqrt {1+b_2^2\sigma_2^2}}+ \frac {b_1}{\sqrt {1+b_2^2\sigma_2^2}}X_1\right) \tag{2}$$
and one can compare models $(1)$ and $(2)$.
The above theoretical expression, tells us where our maximum likelihood estimator of $b_1$ is going to converge, since it remains a consistent estimator, in the sense that it will converge to the theoretical quantity that really exists in the model (and of course, not in the sense that it will find the "truth" in any case):
$$\hat b_1 \xrightarrow{p} \frac {b_1}{\sqrt {1+b_2^2\sigma_2^2}} \implies |\hat b_1|< |b_1|$$
which is the "bias towards zero" result.
We used the probit model, and not the logit (logistic regression), because only under normality can we derive the distribution of $\epsilon$. The logistic distribution is not closed under addition. This means that if we omit a relevant variable in logistic regression, we also create distributional misspecification, because the error term (that now includes the omitted variable) no longer follows a logistic distribution. But this does not change the bias result (see footnote 6 in the paper linked to by the OP).
Omitted variable bias (OVB) is agnostic to the causal relationship between $X$ and $Z$. It concerns only the ability to estimate $\tau$ in the structural model for $Y$. The joint distribution of $Y$, $X$, and $Z$ is compatible both with a data-generating process in which $Z$ is a confounder of the $X \rightarrow Y$ relationship, so that $\tau$ represents the total effect of $X$ on $Y$, and with a data-generating process in which $Z$ is a mediator of the $X \rightarrow Y$ relationship, so that $\tau$ represents the direct effect of $X$ on $Y$.
In the confounding model, the data-generating process for $X$ and $Z$ is:
$$
Z := \epsilon_Z \\
X := \gamma Z + \epsilon_X
$$
In the mediation model, the data-genertaing process for $X$ and $Z$ is:
$$
Z := \alpha X + \epsilon_Z \\
X := \epsilon_X
$$
For the confounding process, omitting $Z$ from the model for $Y$ yields a biased estimate of $\tau$, the total effect of $X$ on $Y$. Thisis the classic bias due to an omitted confounder.
For the mediation process, the $X \rightarrow Y$ relationship is not confounded. The estimated coefficient $\hat \tau$ in the model omitting $Z$ is unbiased for the total causal effect of $X$ on $Y$. However, it is biased for $\tau$, the direct effect of $X$ on $Y$.
This is all to say that it's possible to have OVB without confounding if the coefficient you are trying to estimate is a direct effect, in which case omitting the mediator yields a biased estimate of this quantity. In the absence of confounding, the model omitting the mediator yields the total effect. The formula for the bias is the same regardless of the data-generating process of $X$ and $Z$, but the interpretation of the biased parameter depends on the causal relationship between $X$ and $Z$.
Best Answer
You need to distinguish the causal graph from the regression coefficients here. Something is only 'spurious' if it does not identify the causal effect of interest, and this depends on the graph structure you have assumed, not on any regression coefficients.
As an example (and restricting ourselves to causal DAG structures with no hidden variables) assume X causes Y and X causes Z. Then even if Z does not cause Y you will be able to regress the Y on Z and get a non-zero coefficient, so that doesn't tell you much. Conditioning on X in a regression of Y on Z is the right thing to do if you want to know what the causal effect of Z is on Y assuming that X causes both Y and Z and that Z causes Y rather than vice versa. If, on the other hand, Y causes Z, then despite there being no causal effect to estimate you will again get a non-zero regression coefficient.
It all depends on which variables are connected by causal arrows and which direction those arrows point. It's sometimes useful to simulate data with the relevant structure and run the regressions to get a feel for what can happen.
There are some situations where causal structure can be inferred from regressing things on other things and finding zero coefficients, but they are fairly limited. A nice overview can be found in chapter 25 of Shalizi's draft textbook (ch.21-24 are also worth reading). Leaving aside discovery, the basic theoretical framework can be found in compressed form in Pearl's review paper, and as a more leisurely exposition in the references here.
Unfortunately this means that the answer to each of your three questions is "it depends" (on the graph), but the references above should hopefully point you towards what you would have to assume to interpret things they way you're considering.