My impression is that this question does not have a unique, fully general answer, so I will only explore the simplest case, and in a bit informal way.
Assume that the true Data Generating Mechanism is
$$y_t = y_{t-1} + u_t,\;\; t=1,...,T,\;\; y_0 =0 \tag{1}$$
with $u_t$ a usual zero-mean i.i.d. white noise component, $E(u_t^2)= \sigma^2_u$ . The above also imply that
$$y_t = \sum_{i=1}^tu_i \tag{2}$$
We specify a model, call it model $A$
$$y_t = \beta y_{t-1} + u_t,\;\; t=1,...,T,\;\; y_0 =0 \tag{3}$$
and we obtain an estimate $\hat \beta$ for the postulated $\beta$ (let's discuss the estimation method only if need arises).
So a $k$-steps-ahead prediction will be
$$\hat y_{T+k} = \hat \beta^k y_T \tag{4}$$
and its MSE will be
$$MSE_A[\hat y_{T+k}] = E\left(\hat \beta^k y_T-y_{T+k}\right)^2 $$
$$=E\left[(\hat \beta^k-1) y_T -\sum_{i=T+1}^{T+k}u_i \right]^2 = E\big[(\hat\beta^k-1)^2 y_T^2\big]+ k\sigma^2_u \tag{5}$$
(the middle term of the square vanishes, as well as the cross-products of future errors).
Let's now say that we have differenced our data, and specified a model $B$
$$\Delta y_t = \gamma \Delta y_{t-1} + u_t \tag{6}$$
and obtained an estimate $\hat \gamma$. Our differenced model can be written
$$y_t = y_{t-1} + \gamma (y_{t-1}-y_{t-2}) + u_t \tag{7}$$
so forecasting the level of the process, we will have
$$\hat y_{T+1} = y_{T} + \hat \gamma (y_{T}-y_{T-1})$$
which in reality, given the true DGP will be
$$\hat y_{T+1} = y_{T} + \hat \gamma u_T \tag {8}$$
It is easy to verify then that, for model $B$,
$$\hat y_{T+k} = y_{T} + \big(\hat \gamma + \hat \gamma^2+...+\hat \gamma^k)u_T $$
Now, we reasonably expect that, given any "tested and tried" estimation procedure, we will obtain $|\hat \gamma|<1$ since its true value is $0$, except if we have too few data, or in very "bad" shape. So we can say that in most cases we will have
$$\hat y_{T+k} = y_{T} + \frac {\hat \gamma - \hat \gamma ^{k+1}}{1-\hat \gamma}u_T \tag{9}$$
and so
$$MSE_B[\hat y_{T+k}] =
E\left[\left(\frac {\hat \gamma - \hat \gamma ^{k+1}}{1-\hat \gamma}\right)^2u_T^2\right] + k\sigma^2_u \tag{10}$$
while I repeat for convenience
$$MSE_A[\hat y_{T+k} ] = E\big[(\hat\beta^k-1)^2 y_T^2\big]+ k\sigma^2_u \tag{5}$$
So, in order for the differenced model to perform better in terms of prediction MSE, we want
$$MSE_B[\hat y_{T+k}] \leq MSE_A[\hat y_{T+k}]$$
$$\Rightarrow E\left[\left(\frac {\hat \gamma - \hat \gamma ^{k+1}}{1-\hat \gamma}\right)^2u_T^2\right] \leq E\big[(\hat\beta^k-1)^2 y_T^2\big] $$
As with the estimator in model $B$, we extend the same courtesy to the estimator in model $A$: we reasonably expect that $\hat \beta$ will be "close to unity".
It is evident that if it so happens that $\hat \beta >1$, the quantity in the right-hand-side of the inequality will tend to increase without bound as $k$, the number of forecast-ahead steps, will increase. On the other hand, the quantity on the left-hand side of the desired inequality, may increase as $k$ increases, but it has an upper bound. So in this scenario, we expect the differenced model $B$ to fair better in terms of prediction MSE compared to model $A$.
But assume the more advantageous to model $A$ case, where $\hat \beta <1$. Then the right-hand side quantity also has a bound. Then as $k \rightarrow \infty$ we have to examine whether
$$E\left[\left(\frac {\hat \gamma}{1-\hat \gamma}\right)^2u_T^2\right] \leq E\big[y_T^2\big]= T\sigma^2_u\;\; ??$$
(the $k \rightarrow \infty$ is a convenience -in reality both magnitudes will be close to their suprema already for small values of $k$).
Note that the term $ \left(\frac {\hat \gamma }{1-\hat \gamma}\right)^2$ is expected to be "rather close" to $0$, so model $B$ has an advantage from this aspect.
We cannot separate the remaining expected value, because the estimator $\hat \gamma$ is not independent from $u_T$. But we can transform the inequality into
$$\operatorname{Cov}\left[\left(\frac {\hat \gamma}{1-\hat \gamma}\right)^2,\,u_T^2\right] + E\left[\left(\frac {\hat \gamma}{1-\hat \gamma}\right)^2\right]\cdot \sigma^2_u \leq T\sigma^2_u\;\; ??$$
$$\Rightarrow \operatorname{Cov}\left[\left(\frac {\hat \gamma}{1-\hat \gamma}\right)^2,\,u_T^2\right] \leq \left (T-E\left[\left(\frac {\hat \gamma}{1-\hat \gamma}\right)^2\right]\right)\cdot \sigma^2_u \;\; ??$$
Now, the covariance on the left-hand side is expected to be small, since the estimator $\hat \gamma$ depends on all $T$ errors. On the other side of the inequality, $\hat \gamma$ comes from a stationary data set, and so the expected value of the above function of it is expected to be much less than the size of the sample (since more over this function will range in $(0,1)$).
So in all, without discussing any specific estimation method, I believe that we were able to show informally that the differenced model should be expected to perform better in terms of prediction MSE.
Best Answer
Short answers: