Note that correlation conditional on $Z$ is a variable that depends on $Z$, whereas partial correlation is a single number.
Furthermore, partial correlation is defined based on the residuals from linear regression. Thus, if the actual relationship is nonlinear, the partial correlation may obtain a different value than the conditional correlation, even if the correlation conditional on $Z$ is a constant independent of $Z$. On the other hand, it $X,Y,X$ are multivariate Gaussian, the partial correlation equals the conditional correlation.
For an example where constant conditional correlation $\neq$ partial correlation: $$Z\sim U(-1,1),~X=Z^2+e,~Y=Z^2-e,~e\sim N(0,1),e\perp Z.$$ No matter which value $Z$ takes, the conditional correlation will be -1. However, the linear regressions $X|Z$,$Y|Z$ will be constants 0, and thus the residuals will be the values $X$,$Y$ themselves. Thus, the partial correlation equals the correlation between $X$,$Y$; which does not equal -1, as clearly the variables are not perfectly correlated if $Z$ is not known.
Apparently, Baba and Sibuya (2005) show the equivalence of partial correlation and conditional correlation for some other distributions besides multivariate Gaussian, but I did not read this.
The answer to your question 2 seems to exist in the Wikipedia article, the second equation under Using recursive formula.
The main reason for the "reversal" you are looking at when you deal with AR and MA processes, is that these processes generally have the property that they are invertible to the form of the other process (so long as the coefficients in the models are within the unit circle). So a finite AR process can be represented as an infinite MA process, and a finite MA process can be represented as an infinite AR process. For a general MA(q) process you have:
$$Z_t = \Bigg( 1 - \sum_{i=1}^q \theta_i B^i \Bigg) \epsilon_t = \prod_{i=1}^q (1 - \tau_i B) \epsilon_t,$$
where $B$ is the backshift operator. If $\max|\tau_i| < 1$ (so that all the coefficients are inside the unit circle) then the process is invertible and we have:
$$\epsilon_t = \prod_{i=1}^q (1 - \tau_i B)^{-1} Z_t = \prod_{i=1}^q \Bigg( \sum_{k=0}^\infty \tau_i^k B^k \Bigg) Z_t.$$
Re-arranging this expression gives the AR($\infty$) process:
$$Z_t = \Bigg[ \prod_{i=1}^q \Bigg( \sum_{k=0}^\infty \tau_i^k B^k \Bigg) -1 \Bigg] Z_t + \epsilon_t.$$
Now, the PACF is giving you the conditional correlation for a given lag, conditional on knowledge of the values of the intervening times. For an AR process, this measures the autocorrelations in the process. Hence, for an invertible MA process, the PACF will measure the autocorrelations in the AR($\infty$) process that corresponds to that process. The measured PACF values will decay gradually because the AR process being measured is infinite.
Best Answer
For a while forget about time stamps. Consider three variables: $X, Y, Z$.
Let's say $Z$ has a direct influence on the variable $X$. You can think of $Z$ as some economic parameter in US which is influencing some other economic parameter $X$ of China.
Now it may be that a parameter $Y$ (some parameter in England) is also directly influenced by $Z$. But there is an independent relationship between $X$ and $Y$ as well. By independence here I mean that this relationship is independent from $Z$.
So you see when $Z$ changes, $X$ changes because of the direct relationship between $X$ and $Z$, and also because $Z$ changes $Y$ which in turn changes $X$. So $X$ changes because of two reasons.
Now read this with $Z=y_{t-h}, \ \ Y=y_{t-h+\tau}$ and $X=y_t$ (where $h>\tau$).
Autocorrelation between $X$ and $Z$ will take into account all changes in $X$ whether coming from $Z$ directly or through $Y$.
Partial autocorrelation removes the indirect impact of $Z$ on $X$ coming through $Y$.
How it is done? That is explained in the other answer given to your question.