For understanding this I always prefer the cholesky-decomposition of the correlation-matrix.
Assume the correlation-matrix R of the three variable $X.Y.Z$ as
$$ \text{ R =} \left[ \begin{array} {rrr}
1.00& -0.29& -0.45\\
-0.29& 1.00& 0.93\\
-0.45& 0.93& 1.00
\end{array} \right]
$$
Then the cholesky-decomposition L is
$$ \text{ L =} \left[ \begin{array} {rrr}
X\\ Y \\ Z \end{array} \right] = \left[ \begin{array} {rrr}
1.00& 0.00& 0.00\\
-0.29& 0.96& 0.00\\
-0.45& 0.83& 0.32
\end{array} \right]
$$
The matrix L gives somehow the coordinates of the three variables in an euclidean space if the variables are seen as vectors from the origin, where the x-axis is identified with the variable/vector X and so on.
Then the correlations of X and Y is $\newcommand{\corr}{\rm corr} \corr(X,Y)=x_1 y_1 + x_2 y_2 + x_3 y_3 $ and we see immediately it it $\corr(X,Y)=-0.29 $ because of the zeros and the unit-factor. We see also immediately the correlation $\corr(X,Z)=-0.45$ again because of the zeros and the unit-cofactor. However, the correlation between Y and Z is $\corr(Y,Z) = -0.29 \cdot -0.45 + 0.96 \cdot 0.83$ The partial correlation (after X is removed) is that part for which no variance in the X-variable is present, so $\corr(Y,Z)._X = 0.96 \cdot 0.83 $. Now imagine, the value $0.83$ would be $-0.83$ instead. Then the partial correlation would be negative and the correlation between Y and Z were $ 0.29 \cdot 0.45 - 0.96 \cdot 0.83$
What we see is, that the partial correlations are partly independent from the overall correlations (though within some bounds)
Note that correlation conditional on $Z$ is a variable that depends on $Z$, whereas partial correlation is a single number.
Furthermore, partial correlation is defined based on the residuals from linear regression. Thus, if the actual relationship is nonlinear, the partial correlation may obtain a different value than the conditional correlation, even if the correlation conditional on $Z$ is a constant independent of $Z$. On the other hand, it $X,Y,X$ are multivariate Gaussian, the partial correlation equals the conditional correlation.
For an example where constant conditional correlation $\neq$ partial correlation: $$Z\sim U(-1,1),~X=Z^2+e,~Y=Z^2-e,~e\sim N(0,1),e\perp Z.$$ No matter which value $Z$ takes, the conditional correlation will be -1. However, the linear regressions $X|Z$,$Y|Z$ will be constants 0, and thus the residuals will be the values $X$,$Y$ themselves. Thus, the partial correlation equals the correlation between $X$,$Y$; which does not equal -1, as clearly the variables are not perfectly correlated if $Z$ is not known.
Apparently, Baba and Sibuya (2005) show the equivalence of partial correlation and conditional correlation for some other distributions besides multivariate Gaussian, but I did not read this.
The answer to your question 2 seems to exist in the Wikipedia article, the second equation under Using recursive formula.
Best Answer
Mediation is a causal concept. It specifically refers to the causal relationship $A \rightarrow C \rightarrow B$; that is, $C$ comes temporally between $A$ and $B$ and is caused by $A$ and causes $B$. A mediator is a type of variable that has this property.
Partial correlation is a statistical concept that involves "partialing out" (i.e., removing) the association between $A$ and $C$ and between $B$ and $C$ before computing the correlations between $A$ and $B$. This identifies the association between $A$ and $B$ that is unrelated (linearly) to $C$. The types of variables that $A$, $B$, and $C$ are (i.e., whether they are mediators, treatments, outcomes, confounders, etc.) is irrelevant to computing and describing partial correlation.
When $A$ precedes $C$ and $C$ precedes $B$ (i.e., so $A$ is a treatment, $C$ is a potential mediator, and $B$ is an outcome), then the partial correlation of $A$ and $B$ controlling for $C$ is the direct effect of $A$ on $B$¹. The direct effect is a mediation concept and refers to the part of the relationship between $A$ and $B$ that is not mediated through $C$. When these variables have different meanings and causal orderings, the partial correlations may have different interpretations.
¹ Only when certain causal and modeling assumptions are met.