My answer will be based on the script of an intersection theory class by Professor Barbara Fantechi found here: http://www.cmi.ac.in/~asengupta/int.pdf
(see page 4). The proof of the central Lemma will be very technical and if I have overlooked something please point it out in the comment section. The crucial point in the argument is the reduction to a local ring computation. For this reduction, the following Lemma is the key:
Lemma: Let $X,Y$ be separated, finite type schemes over a field $k$ equidimensional of the same dimension and let $f: X \to Y$ be a dominant morphism. Let $Y' \subset Y$ be an irreducible component and $A=\mathcal{O}_{Y',Y}$ be the local ring at its generic point. Note that we have a canonical map $\text{Spec}(A) \to Y$. Then when we do the cartesian product of $f$ with this map, we have
$$\text{Spec}(A) \times_Y X = \text{Spec}( \prod_i \mathcal O_{X_i,X}),$$
where the $X_i$ are the irreducible components of $X$ mapping dominantly to $Y'$.
Note that for $f$ proper $B=\prod_i \mathcal O_{X_i,X}$ contains all the information we need to compute the coefficient of $(Y')^{red}$ in $f_* [X]$, namely the multiplicity $l(\mathcal O_{X_i,X})$ of $X_i^{red}$ in $[X]$ and the degree of the field extension $[R(X_i^{red}):R((Y')^{red})]=[R(\mathcal O_{X_i,X}):R(A)]$, where $R$ denotes the function field and the total quotient ring respectively.
Let me do the desired reduction using this Lemma: to compare the coefficients of $Y_j'$ as described in your question, we may thus base change the map $f':X' \to Y'$ by the inclusion of the spectrum of $A=\mathcal O_{Y'_j,Y}$ in $Y'$.
$$
\begin{array}{ccccc}
\text{Spec}(B)& \to & X' & \to & X \cr
\downarrow& & \downarrow & & \downarrow \cr
\text{Spec}(A)& \to & Y' & \to & Y
\end{array}.
$$
However $X'$ was already a cartesian product, so we may as well base change the map $f:X \to Y$ by $\text{Spec}(A) \to Y$. But using the Fact from the script (sometimes called the Going-down theorem for flat morphisms) we know that $Y'_j$ dominates $Y$. From this one sees that the inclusion $\text{Spec}(A) \to Y$ factors through $\text{Spec}(R(Y))$. But now we can apply the Lemma above again to see that the following are commuting squares:
$$
\begin{array}{ccccc}
\text{Spec}(B)& \to & R(X) & \to & X \cr
\downarrow& & \downarrow & & \downarrow \cr
\text{Spec}(A)& \to & R(Y) & \to & Y
\end{array}.
$$
Indeed, the outer rectangle is a cartesian square by definition and the right square is cartesian by applying the Lemma to $f:X \to Y$ noting that $X,Y$ were assumed to be irreducible varieties of the same dimension. From here on the proof from the script above should go through straightforward.
Proof of the Lemma: Because the inclusion of $\text{Spec}(A)$ in $Y$ factors through the complement of the irreducible components of $Y$ different from $Y'$ we may assume $Y=Y'$ is irreducible. Now in $X$ note that the intersections of the irreducible components of $X$ are all of lower dimension than $Y$ and thus their images in $Y$ are also of positive codimension. By removing their closures in $Y$ and thus shrinking $Y$ further we may assume that $X$ is a disjoint union of irreducible finite type-schemes $X_i$. If some of them do not map dominantly to $Y$ we can shrink $Y$ even further to exclude their images and hence we may assume that they all map dominantly to $Y$.
Now we want to reduce to an affine computation. Shrink $Y$ to an affine open subset $\text{Spec}(C)$ and cover one such $X_i$ by open affines $\text{Spec}(D)$. We will see that all the corresponding cartesian products $\text{Spec}(A) \times_{\text{Spec}(C)} \text{Spec}(D)$ are canonically identified with $\text{Spec}(\mathcal O_{X_i,X})$, thus gluing them is trivial and the proof is finished. As $X_i, Y$ are irreducible, there are unique minimal primes $p_D, p_C$ in $D,C$ corresponding to the generic points $\eta_{X_i},\eta_Y$ of $X_i,Y$. Note that hence the local ring $A$ at $\eta_Y$ is $A=C_{p_C}$. Because $X_i$ dominates $Y$ we have that $\eta_{X_i}$ maps to $\eta_Y$, so for the corresponding map $\phi:C \to D$ we have $\phi^{-1}(p_D) = p_C$. We now have to show
$$C_{p_C} \otimes_{C} D = D_{p_D}.$$
The left hand side is exactly $(C \setminus p_C)^{-1} D$, hence for $d \in D \setminus p_D$ we have to find an inverse using only $C \setminus p_C$ in the denominator. But now finally we can use that $X,Y$ have the same dimension. This implies that the corresponding extension $R(C) \subset R(D)$ of the total quotient fields is algebraic, because both have the same transcendence degree over $k$. Hence there are $c_n, \ldots, c_0 \in R(C)$ with $c_0 \neq 0 \in R(C)$ and
$$c_n d^n + \ldots + c_1 d + c_0 = 0 \in R(D).$$
Multiplying by the denominators we may assume that all $c_i$ are in $C$ and that $c_0 \in C \setminus p_C$. Then one checks that this implies that
$$d \cdot \frac{c_n d^{n-1} + \ldots + c_1}{-c_0} = 1 \in D_{p_D}. $$
Thus the proof is finished.
Here is my partial progress, from a while ago. I don't think I really got anywhere.
Let $I$ be the sheaf of ideals that vanish on the image of the diagonal morphism. Then we are required to find a sheaf $J \subset I$ locally generated by sections such that $J=I$ on some open neighborhood of the diagonal. We can just take $J$ to be the maximal subsheaf in $I$ locally generated by global sections. We need to choose some neighborhood of the diagonal. I cannot see any process to choose a neighborhood other than the union of $U_0 \times U_0$ for all $U_0$ affine open. Given a section, we want to extend it to all affine open neighborhoods of $U_0 \times U_0$. Since it is enough to check on a basis of affine opens, it is enough to check that it extends to all affine opens of the form $A \times B$ for $A,B \subset X$ affine open with $U_0 \subset A \cap B$.
Thus we have a scheme $A \cup B$ for $A,B$ affine and an affine open $U_0 \subset A \cap B$. We have a function on $U_0 \times U_0$ that vanishes on the diagonal. Without loss of generality this is $f \otimes 1 - 1 \otimes f$ for $f \in U_0$. We want to extend this to a function on $A \otimes B$. Such a function could take the form $r \otimes u - s \otimes t$ where $r$ and $s$ are functions on $A$, $t$ and $u$ are functions on $B$, and $s$ and $u$ are nonvanishing on $U_0$, $f=r/s=t/u$ on $U_0$ and $ru-st=0$ on all of $U$. It is easy to find pairs satisfying every condition but the last. But there is no obvious way to get them to satisfy the last condition.
One approach would be to first find $r/s$ satisfying the first two properties, then extend $r$ and $s$ from $A \cap B$ to $B$ as fractions, then divide those fractions as $t$ and $u$. Unfortunately, you can only write each function as a fraction on each affine - on different affines they could be different fractions, so $ru-st$ need not be $0$. It is easy to create examples of unextendable functions
But it is hard to produce a counterexample from this, because if you make the function unextendable on $B$, an adversary trying to prove that the morphism is a strong immersion can just try to extend it the same way on $A$. Thus, you need to find an affine scheme with a very bad open subset, and then embed that subset into a very different affine scheme where it is still very bad but in a different way.
My attempts to forcibly construct a counterexample led to a big confusing mess. You want to set up a "dictionary" where nice well-behaved functions on $A$ become very ugly glued-together messes of fractions of functions $B$, with the functions in those fractions discussed in a second dictionary where they are turned into ugly messes on $A$, so someone who tries to extend a function consistently both ways is left with an infinite regress. But it is not clear that defining this does not create an infinite regress, or if it doesn't what the properties of the resulting object are.
Best Answer
If $f$ is finite flat of degree $d$, then $f \times f \colon X \times X \to Y \times Y$ has degree $d^2$, but $\Delta_f \colon \Delta_X \to \Delta_Y$ has degree $d$. So equality cannot hold scheme-theoretically unless $f$ is an isomorphism.
A fairly explicit case is the finite étale Galois case, where $(f \times f)^{-1}(\Delta_Y)$ is the disjoint union of the graphs $\Gamma_\sigma$ of deck transformations $\sigma \colon X \to X$.
If $f$ is only étale but not Galois, then $(f\times f)^{-1}(\Delta_Y)$ will be smooth and $\Delta_X$ is still a connected component, but the other components will not map isomorphically onto $X$ under their projections.
If $f$ is generically étale (i.e. separable), then there is a dense open $U \subseteq Y$ above which the above holds, so $(f \times f)^{-1}(\Delta_Y)$ is still generically smooth (i.e. geometrically reduced). But for each component of the branch divisor $D \subseteq Y$, there will be another irreducible component of $(f \times f)^{-1}(\Delta_Y)$ intersecting $\Delta_X$ above $D$.
In the inseparable case, you get situations where $\Delta_X$ occurs with multiplicity $>1$, but that only happens in positive characteristic.