As was already said, the term "ideal" came from Kummer's ideal numbers (more precisely, "ideal complex numbers" as Kummer was concerned with factorizations of algebraic integers which lie in the complex field). I'll try to give a brief intuition not mentioned here explicitly already.
When factoring, say, 60, you find 2 "different" factorizations: $60=15\times 4=12\times 5$. This does not contradict unique factorization in integers since you have not factored "enough": after factoring all the numbers as much as possible you obtain the unique factorization $60=2\times 2\times 3\times 5$.
However, in the context of algebraic integers this does not always hold. The famous example is in $\mathbb{Z}[\sqrt{-5}]$. There you have $6=2\times 3=(1-\sqrt{-5})\times(1+\sqrt{-5})$ but the factors are irreducible, so unique factorization fails.
Kummer's idea was that in this case as well the problem is that the factors were not factored "enough". His approach was to assume that there are better, "ideal" factors for which the unique factorization hold.
It is obvious that there is something problematic here - you need to construct such ideal numbers, prove their existence, etc. However there is also another way created by Dedekind. Dedekind defines not the ideal numbers themselves, but the sets of elements they divide. For example, instead of talking about "2" you can talk about the set $\{0,2,-2,4,-4,6,-6,\dots\}$ of even numbers in the integer - the ideal created by 2.
Dedekind noted that this concept of "being divided by" can be characterized by two properties:
- If some number (ideal or not) divides $a$ and $b$, it divided $a+b$.
- If some number (ideal or not) divides $a$, it divdes $\lambda \times a$ for all $\lambda$.
So he defined ideals using these two properties. They turned out to be just enough to prove the grand theorem that is in the base of algebraic number theory - that in Dedekind domains (and so in algebraic integer ring) there is a unique factorization of elements of ideals to products of prime ideals (this applies to elements as well, since an element can be identified with the ideal it generates).
This is quite orthogonal to the usage of ideals usually encountered in an undergrad level algebra course - where ideals pop up naturally (and more generally) as kernels of homomorphisms. But here the name "ideal" is indeed confusing.
For your first question, $S_X$ is defined as the group of bijections from $X$ to $X$. The group operation is function composition. If $X$ is finite, say with $n$ elements, then the groups $S_X$ and $S_n$ are obviously (noncanonically) isomorphic.
There is always at least one group homomorphism from $G$ to $S_X$, namely the one which sends everything in $G$ to the identity map in $S_X$. This is the "stupid group action" which doesn't do anything ($g.x = x$ for all $g \in G$ and $x \in X$).
There are typically many homomorphisms from $G$ to $S_X$. Therefore, there are typically many group actions of $G$ on $X$.
"Once we specify a group and a set, how do we find such homomorphisms?"
Good question. In many settings, the group action comes up naturally. It's not like people are taking random sets $X$ and $G$ and asking, "I wonder how many different actions of $G$ I can find on $X$." I mean maybe some people are doing this, but usually the group action is a convenient way to describe some existing phenomenon they are trying to study.
Example: Let $X = \{ z \in \mathbb C: \operatorname{Im}(z) > 0\}$ be the upper half plane, and let $G = \operatorname{SL}_2(\mathbb Z)$ be the group of integer matrices with determinant $\pm 1$. There is a natural group action of $G$ on $X$ by
$$\begin{pmatrix} a & b \\ c & d \end{pmatrix}.z = \frac{az+b}{cz+d}.$$
This came up (I believe) when people were working out problems like describing all holomorphic bijections from the Riemann sphere to itself, and it turns out that they all look like the above (where the matrix can be anything in $\operatorname{GL}_2(\mathbb R)$).
One example where people are taking certain groups $G$ and certain sets $X$, and asking what are the ways $G$ can act on $X$, is when $X$ is a vector space. The bijections from $X$ to itself coming from the elements of $G$ are required to also be linear transformations on $X$. Such group actions are called representations. Representation theory includes the problem of describing all representations of a given group on a given vector space, and this turns out to be an extremely difficult problem in general. Finding such group actions is no simple matter.
Best Answer
There are many many reasons why kernels are important to group theory, but here's just one way of appreciating the kernel in a fairly isolated context.
If we zoom out a bit, any set-function $f: A \to B$ (here $A$ and $B$ are simply sets) naturally partitions $A$ into equivalence classes, and for $a \in A$, the equivalence class of $a$ is given by
$$[a] = \{a' \in A : f(a') = f(a)\};$$
the set of all elements of $A$ that get mapped to the same thing as $a$. The same logic applies if $f : G \to H$ isn't just a set function, but a homomorphism of groups.
With the equivalence class notation, the kernel of $f$ is simply the equivalence class of the identity $1_G$ of $G$,
$$\ker f = [e_G],$$
since any homomorphism $f: G \to H$ always sends the identity $e_G$ of $G$ to the identity $e_H$ of $H$. What can we say about arbitrary $g, g' \in G$ such that $f(g) = f(g')$? That is, what can we say about the equivalence class $[g]$ for any $g \in G$?
If you've ever heard homomorphisms described as functions that "respect" the group operation(s), the size of the kernel is a measure of just how "respectful" a given homomorphism is! A large kernel means that more of the structure of the group $G$ is "ignored" when transported to the group $H$.
Edit:
For "respectful", imagine two situations, considering $S_3$, the symmetric group of degree $3$. There's a sign homomorphism $\operatorname{sgn} : S_3 \to \{-1, 1\} = C_2$ to the multiplicative group $C_2$ sending each permutation to its sign. Its kernel is the alternating group $A_3 = \{1, (123), (132)\}$ of "even" permutations, and in the image $C_2$, almost all of the structure of $S_3$ is ignored; we forget everything but whether a permutation is even or odd.
On the other hand, we have a "copy" of $S_3$ as a subgroup of $S_4$ if we consider all permutations of $S_4$ that leave $4$ fixed. This leads to an "inclusion" homomorphism $\iota: S_3 \to S_4$, sending each permutation to its "copy" in $S_4$. This inclusion homomorphism has only the identity of $S_3$ in its kernel, and is considerably more "respectful" than the sign homomorphism; every bit of information about $S_3$ shows up in the image $\iota(S_3)$.