There are many many reasons why kernels are important to group theory, but here's just one way of appreciating the kernel in a fairly isolated context.
If we zoom out a bit, any set-function $f: A \to B$ (here $A$ and $B$ are simply sets) naturally partitions $A$ into equivalence classes, and for $a \in A$, the equivalence class of $a$ is given by
$$[a] = \{a' \in A : f(a') = f(a)\};$$
the set of all elements of $A$ that get mapped to the same thing as $a$. The same logic applies if $f : G \to H$ isn't just a set function, but a homomorphism of groups.
With the equivalence class notation, the kernel of $f$ is simply the equivalence class of the identity $1_G$ of $G$,
$$\ker f = [e_G],$$
since any homomorphism $f: G \to H$ always sends the identity $e_G$ of $G$ to the identity $e_H$ of $H$. What can we say about arbitrary $g, g' \in G$ such that $f(g) = f(g')$? That is, what can we say about the equivalence class $[g]$ for any $g \in G$?
Claim: For a homomorphism $f: G \to H$ and $g \in G$, we have $f(g) = f(g')$ if and only if there exists some $k \in \ker f$ such that $gk = g'$; that is, $g$ and $g'$ differ by a multiple of something in the kernel of $f$. In particular, $[g] = \{gk: k \in \ker f\}$, and has size $|\ker f|$.
($\Longrightarrow$) Supposing $f(g) = f(g')$, note that there exists a unique $g^* = g^{-1}g' \in G$ such that $gg^* = g'$. Then
$$f(g) = f(g') = f(gg^*) = f(g)f(g^*),$$
and left-multiplication by $f(g)^{-1}$ shows that $e_H = f(g^*)$, hence $g^* \in \ker f$.
($\Longleftarrow$) Homework.
If you've ever heard homomorphisms described as functions that "respect" the group operation(s), the size of the kernel is a measure of just how "respectful" a given homomorphism is! A large kernel means that more of the structure of the group $G$ is "ignored" when transported to the group $H$.
Edit:
For "respectful", imagine two situations, considering $S_3$, the symmetric group of degree $3$. There's a sign homomorphism $\operatorname{sgn} : S_3 \to \{-1, 1\} = C_2$ to the multiplicative group $C_2$ sending each permutation to its sign. Its kernel is the alternating group $A_3 = \{1, (123), (132)\}$ of "even" permutations, and in the image $C_2$, almost all of the structure of $S_3$ is ignored; we forget everything but whether a permutation is even or odd.
On the other hand, we have a "copy" of $S_3$ as a subgroup of $S_4$ if we consider all permutations of $S_4$ that leave $4$ fixed. This leads to an "inclusion" homomorphism $\iota: S_3 \to S_4$, sending each permutation to its "copy" in $S_4$. This inclusion homomorphism has only the identity of $S_3$ in its kernel, and is considerably more "respectful" than the sign homomorphism; every bit of information about $S_3$ shows up in the image $\iota(S_3)$.
Best Answer
It should not be surprising that if a homomorphism $\phi$ is injective, then its kernel is trivial. After all, injectivity requires that the preimage of every element of the image be unique. (And the homomorphism property requires that the preimage of the identity be the identity, in particular.)
It is the converse -- that the triviality of the kernel is sufficient for injectivity -- that is less obvious. Mark Bennett has given the core idea: symbolically, if $\phi(a)$ equals $\phi(b)$, then their (group) inverses are equal too, and thus
$$\phi(ab^{-1})=\phi(a)\phi(b^{-1})=\phi(a)\phi(b)^{-1}=e$$
The first and second steps of this calculation work only because of the homomorphism property of $\phi$. Now, if the kernel is trivial, we conclude that $ab^{-1}=e$, so $a=b$, and injectivity follows.
Here is a nonstandard but alternative way to think about this result; it is good for intuition and illustrates another Big Idea. Say a function $f$ is injective at $y$, an element of the image, if the preimage $f^{-1}(y)$ has only one element. This is a "local" property; a function might be injective at $y_1$ but not at $y_2$.
For general functions between sets, injectivity at a point is not sufficient to deduce global injectivity -- that is, injectivity at every point. But if $f$ is a group homomorphism, not just a bare function between sets, then a special type of local injectivity, namely injectivity at the identity element, is sufficient to deduce global injectivity. The extra structure provided by the homomorphism property and the definition of the identity allow us to parlay a local phenomenon into a global one.
This is a taste of a general class of "local-to-global" results that show how local phenomena can be used to deduce global structure.