Here is one explanation of why countable choice is not problematic in constructive mathematics.
For this discussion it is useful to formulate the axiom of choice as follows:
$(\forall x \in X . \exists y \in Y . R(x,y)) \implies \exists f \in Y^X . \forall x \in X . R(x,f(x))$
This says that a total relation $R \subseteq X \times Y$ contains a function. The usual formulation of the axiom of choice is equivalent to the above one. Indeed, if $(S_i)_{i \in I}$ is a family of non-empty sets we take $X = I$, $Y = \bigcup_i S_i$ and $R(i,x) \iff x \in S_i$ to obtain a choice function $f : I \to \bigcup_i S_i$. Conversely, given a total relation $R \subseteq X \times Y$, consider the the family $(S_x)_{x \in X}$ where $S_x = \lbrace y \in Y \mid R(x,y)\rbrace$ and apply the usual axiom of choice.
One way of viewing sets in constructive mathematics is to imagine that they are collections together with given equality, i.e., some sort of "presets" equipped with equivalence relations. This actually makes sense if you think about how we implement abstract sets in computers: each element of the abstract set is represented by a finite sequence of bits, where each element may have many valid representations (and this is unavoidable in general). Let me give two specific examples:
a natural number $n \in \mathbb{N}$ is represented in the usual binary system, and let us allow leadings zeroes, so that $42$ is represented by $101010$ as well as $0101010$, $00101010$, etc.
a (computable) real $x \in \mathbb{R}$ is represented by machine code (a binary string) that computes arbitrarily good approximations of $x$. Specifically, a piece of code $p$ represents $x$ when $p(n)$ outputs a rational number that differs from $x$ by at most $2^{-n}$. Of course we only represent computable reals this way, and every computable real has many different representations.
Let me write $\mathbb{R}$ for the set of computable reals, because those are the only reals relevant to this discussion.
An essential difference between the first and the second example is that there is a computable canonical choice of representatives for elements of $\mathbb{N}$ (chop off the leading zeroes), whereas there is no such canonical choice for $\mathbb{R}$, for if we had it we could decide equality of computable reals and consequently solve the Halting problem.
According to the constructive interpretation of logic, a statement of the form
$\forall x \in X . \exists y \in Y . R(x, y)$
holds if there is a program $p$ which takes as input a representative for $x \in X$ and produces a representative for $y \in Y$, together with a witness for $R(x,y)$. Crucially, $p$ need not respect equality of $X$. For example,
$\forall x \in \mathbb{R} . \exists n \in \mathbb{N} . x < n$
is accepted because we can write a program which takes as input a representative of $x$, namely a program $p$ as described above, and outputs a natural number larger than $x$, for example $round(p(0)) + 1000$. However, the number $n$ will necessarily depend on $p$, and there is no way too make it depend only on $x$ (computably).
Let us have a look at the axiom of choice again:
$(\forall x \in X . \exists y \in Y . R(x,y)) \implies \exists f \in Y^X . \forall x \in X . R(x,f(x))$
We accept this if there is a program which takes as input a $p$ witnessing totality of $R$ and outputs a representative of a choice function $f$, as well as a witness that $\forall x \in X. R(x, f(x))$ holds. This is probematic because $p$ need not respect equality of $X$, whereas a representative for $f$ must respect equality. It is not clear where we could get it from, and in specific examples we can show that there isn't one. Already the following fails:
$(\forall x \in \mathbb{R} . \exists n \in \mathbb{N} . x < n) \implies \exists f \in \mathbb{N}^\mathbb{R} . \forall x \in \mathbb{R} . x < f(x)$.
Indeed, every computable map $f : \mathbb{R} \to \mathbb{N}$ is constant (because a non-constant one would allow us to write a Halting oracle).
However, if we specialize to countable choice
$(\forall n \in \mathbb{N} . \exists y \in Y . R(n,y)) \implies \exists f \in Y^\mathbb{N} . \forall n \in \mathbb{N} . R(n,f(n))$
then we can produce the desired program. Given $p$ that witnesses totality of $R$, define the following program $q$ that represents a choice function: $q$ takes as input a binary representation of a natural number $n$, possibly with leading zeroes, chops of the leading zeroes, and applies $p$. Now, even if $p$ did not respect equality of natural numbers, $q$ does because it applies $p$ to canonically chosen representatives.
In general, we will accept choice for those sets $X$ that have computable canonical representatives for their elements. Ok, this was a bit quick, but I hope I got the idea accross.
Let me finish with a general comment. Most working mathematicians cannot imagine alternative mathematical universes because they were thoroughly trained to think about only one mathemtical universe, namely classical set theory. As a result their mathematical intuition has fallen a victim to classical set theory. The first step towards understanding why someone might call into question a mathematical principle which seems obviously true to them, is to broaden their horizon by studying other mathematical universes. On a smaller scale this is quite obvious: one cannot make sense of non-Euclidean geometry by interpreting points and lines as those of the Euclidean plane. Similarly, you cannot understand in what way the axiom of choice could fail by interpreting it in classical set theory. You must switch to a different universe, even though you think there isn't one... Of course, this takes some effort, but it's a real eye-opener.
Here is a little progress towards AC.
Theorem.
ICF implies the dual Cantor-Schröder-Bernstein
theorem, that is $X$ surjects onto $Y$ and $Y$ surjects onto $X$,
then they are bijective.
Proof. You explain this in the edit to the question. If
$X\twoheadrightarrow Y$, then $2^Y\leq 2^X$ by taking pre-images,
and so if also $Y\twoheadrightarrow X$, then $2^X\leq 2^Y$ and so
$X\sim Y$ by ICF. QED
Theorem. ICF implies that there are no infinite D-finite
sets.
Proof. (This is a solution to the exercise that you mention.) If $A$ is infinite and
Dedekind-finite, then let $B$ be the set of all finite
non-repeating finite sequences from $A$. This is also D-finite,
since a countably infinite subset of $B$ easily gives rise to a
countably infinite subset of $A$. But meanwhile, $B$ surjects onto
$B+1$, since we can map the empty sequence to the new point, and
apply the shift map to chop off the first element of any sequence.
So $B$ and $B+1$ surject onto each other, and so by the dual
Cantor-Schöder-Bernstein result, they are bijective,
contradicting the fact that $B$ is D-finite. QED
Here is the new part:
Theorem. ICF implies that $\kappa^+$ injects into
$2^\kappa$ for every ordinal $\kappa$.
Proof. We may assume $\kappa$ is infinite. Notice that
$2^{\kappa^2}$ surjects onto $\kappa^+$, since every
$\alpha<\kappa$ is coded by a relation on $\kappa$. Since
$\kappa^2\sim\kappa$, this means
$2^\kappa\twoheadrightarrow\kappa^+$ and consequently
$2^{\kappa^+}\leq 2^{2^\kappa}$, by taking pre-images. It follows that
$2^{2^\kappa}=2^{2^\kappa}\cdot 2^{\kappa^+}=2^{2^\kappa+\kappa^+}$ and
so by ICF we get $2^\kappa+\kappa^+=2^\kappa$, which implies
$\kappa^+\leq 2^\kappa$, as desired. QED
This conclusion already contradicts AD, for example, since AD
implies that there is no $\omega_1$ sequence of distinct reals,
which violates the conclusion when $\kappa=\omega$. In particular,
this shows that ICF implies $\neg$AD, and so in every AD model, there are sets of different cardinalities, whose power sets are equinumerous.
Best Answer
The implication $ACC \implies UCC$ is irreversible in $ZF$. This follows again from the transfer theorem of Pincus:
$UCC$ is an injectively boundable statement, see note 103 in "Consequences of the axiom of choice" by Howard & Rubin. Right after the theorem in pp. 285 and its corollary, there are examples of statements of this kind, one of which is form 31 (which is precisely $UCC$). The fact that this is the case follows in turn from the application of lemma 3.5 in Howard, P.-Solski, J.: "The strength of the $\Delta$-system lemma", Notre Dame J. Formal Logic vol 34, pp. 100-106 - 1993
It is known that $¬ACC$ is boundable, and hence injectively boundable.
Then we can apply the transfer theorem of Pincus to the conjunction $UCC \wedge ¬ACC$ and we are done.
SECOND PROOF: Browsing through the "Consequences..." book I've just found another less direct proof of the same fact. I'm adding it here to avoid the interested person from looking it up for himself (especially since the AC website is not working these days). It involves form 9, known as $W_{\aleph_0}$: a set is finite if and only if it is Dedekind finite. Now, $ACC \implies W_{\aleph_0}$ is provable in $ZF$ (in fact this was already proved by Dedekind), while $UCC$ does not imply $W_{\aleph_0}$. The latter follows from the fact that in the basic Fraenkel model, $\mathcal{N}1$, $UCC$ is valid while $W_{\aleph_0}$ is not, and such a result is transferable by considerations of Pincus that can be found at Pincus, D.: "Zermelo-Fraenkel consistency results by Fraenkel Mostowski methods", J. of Symbolic Logic vol 37, pp. 721-743 - 1972 (doi:10.2307/2272420)