+1, I think this is a really interesting and clearly stated question. However, more information will help us think through this situation.
For example, what is the relationship between $x_n$ and $y$? It's quite possible that there isn't one, in which case, regression $(1)$ offers no advantage relative to regression $(2)$. (Actually, it is at a very slight disadvantage, in the sense that the standard errors will be slightly larger, and thus betas might be slightly further, on average, from their true values.) If there is a function mapping $x_n$ to $y$, then, by definition, there is real information there, and regression $(1)$ will be better in the initial situation.
Next, what is the nature of the relationship between $(x_1, \cdots, x_{n-1})$ and $x_n$? Is there one? For instance, when we conduct experiments, (usually) we try to assign equal numbers of study units to each combination of values of the explanatory variables. (This approach uses a multiple of the Cartesian product of the levels of the IV's, and is called a 'full factorial' design; there are also cases where levels are intentionally confounded to save data, called 'fractional factorial' designs.) If the explanatory variables are orthogonal, your third regression will yield absolutely, exactly 0. On the other hand, in an observational study the covariates are pretty much always correlated. The stronger that correlation, the less information exists in $x_n$. These facts will modulate the relative merits of regression $(1)$ and regression $(2)$.
However, (unfortunately perhaps) it's more complicated than that. One of the important, but difficult, concepts in multiple regression is multicollinearity. Should you attempt to estimate regression $(4)$, you will find that you have perfect multicollinearity, and your software will tell you that the design matrix is not invertible. Thus, while regression $(1)$ may well offer an advantage relative to regression $(2)$, regression $(4)$ will not.
The more interesting question (and the one you're asking) is what if you use regression $(1)$ to make predictions about $y$ using the estimated $x_n$ values output from the predictions of regression $(3)$? (That is, you're not estimating regression $(4)$—you're plugging the output from the prediction equation estimated in regression $(3)$ into prediction model $(4)$.) The thing is that you aren't actually gaining any new information here. Whatever information exists in the first $n-1$ predictor values for each observation is already being used optimally by regression $(2)$, so there is no gain.
Thus, the answer to your first question is that you might as well go with regression $(2)$ for your predictions to save unnecessary work. Note that I have been addressing this in a fairly abstract way, rather than addressing the concrete situation you describe in which someone hands you two data sets (I just can't imagine this occurring). Instead, I'm thinking of this question as trying to understand something fairly deep about the nature of regression. What does occur on occasion, though, is that some observations have values on all predictors, and some other observations (within the same dataset) are missing some values on some of the predictors. This is particularly common when dealing with longitudinal data. In such a situation, you want to investigate multiple imputation.
Best Answer
I agree with Tim that there is no "one size fits all" approach. For example, one must consider the implications of a "not applicable" response for the variable, one's population of interest, and one's research questions. For example, in some cases, a N/A response may indicate that the respondent is not a member of the population from which one wishes to sample. In some cases, an N/A may indicate either that an event will happen but hasn't been measured or that it will never happen. It seems as though one's solution might depend on how one answers such questions.
That said, I've seen some posts (like this one or this one) where the following strategy (or a very similar strategy) is recommended for managing right censored predictors:
Create a variable (CENSORED) and code it as 1 if the event did not occur, and 0 if the event did occur.
Recode the time variable to indicate the amount of time between when the event occurred and when the measurement period ended (TIME2END). Thus, if the event occurred 100 seconds before the measurement period ended, the person would get a score of 100. If the event didn't occur before the period ended, the person would get a score of 0.
These two variables would then both be included in your regression, and should provide reasonably complete info about your predictor.
Jon