There is no universal and objective definition of what is a measurable subset of a general space $X$.
The general concept of a measurable subset has its origins in the problem of measure in Euclidean space:
Problem of measure: Given an object $A\subset\mathbb R^n$,
how does one assign a measure $m(A)\in[0,\infty]$ to $A$? (In the case $n=1,2$ and $3$,
the measure $m(A)$ is traditionally referred to as the length,
the area, and the volume of $A$, respectively).
When the objects considered are very simple,
this question is very easy to answer.
For example, given a line segment $A=[a,b]\subset\mathbb R$,
the measure of $A$ should obviously be of $m(A)=b-a$.
Given a $n$-dimensional rectangle
$$A=[a_1,b_1]\times\cdots\times[a_n,b_n]\subset\mathbb R^n,$$
the answer is equally obvious:
$$m(A)=\prod_{i=1}^n(b_i-a_i).$$
Then,
one can easily extend the measure of rectangle to slightly more general sets,
such as disjoint unions of rectangles
$$A=R_1\dot\cup\cdots\dot\cup R_k$$
by assigning
$$m(A)=\sum_{i=1}^km(R_i)$$
(Indeed,
for a theory of measure to make any kind of geometrical sense,
the measure of a union of disjoint parts should be the sum of the measures of the constituent parts.)
The real problem comes when trying to measure more complicated subsets of $\mathbb R^n$.
A classical solution to the measure problem consists in attempting to approximate the measure of a complicated set using simple sets.
More precisely,
suppose we have a class of simple sets $S$ which we know how to measure (these would contain rectangles and finite unions of rectangle for example).
Then,
given some arbitrary set $A$,
we can define an inner measure $m_I(A)$ and an outer measure $m_O(A)$ of $A$ by letting
$$m_I(A)=\sup\{m(E):E\subset A,~E\in S\}\text{ and }m_O(A)=\inf\{m(E):E\supset A,~E\in S\}.$$
(Note that
the inner and outer measures of sets in $S$ are clearly the same as the measure we have already assigned to them.)
In this framework,
one calls a set $A\subset\mathbb R^n$ measurable if $m_O(A)=m_I(A)$,
in which case we assign $m(A)=m_O(A)=m_I(A)$.
In other words,
we call a set measurable if our theory of measure is capable of giving a sensible answer to "what is the measure of $A$?"
The solution of the measure problem I discussed in the previous paragraph gave rise to the Jordan theory of measure,
as well as the more modern Lebesgue measure.
The concepts of outer measure and general measures you have written down in your questions are answers to generalizations of the problem of measure to arbitrary spaces $X$,
with the intent of extending the theory of Lebesgue integration to those spaces.
The exact axioms (i.e., the definitions of a $\sigma$-algebra and measure)
are in place in order to ensure that we obtain a theory of integration that is similar to the theory of Lebesgue measure/integration,
that is,
with similar theorems such as
countable subadditivity,
countinuity from above/below,
monotone/dominated convergence, etc.
Going into more detail would require a lot of explanation. If you'd like to know more, I personally recommend reading sections 1.1 - 1.4 of An introduction to measure theory by Terence Tao.
I only really started to understand measure theory when I read it.
Yes, a measure space has a measure function that is defined on a $\sigma$-algebra. So having a $\sigma$-algebra is a prerequisite for having a measure, and hence a set with just a $\sigma$-algebra is "measurable" (we could define a measure on it, but in some cases we cannot..) The $\sigma$-algebra is the "wish list": all sets we would like to be able to measure using a measure function. We already anticipate on our wish and call all sets in the $\sigma$-algebra "measurable").
It's analogous to having a set with a topology and calling its members "open". We have a set and a $\sigma$-algebra and call its members "measurable", or we have a set with a bornology and call its members "bounded", or convexity and "convex" etc.
On the powerset of a set $X$ we can always define a measure pick $p \in X$ and define: $\mu(A) = 1$ iff $p \in A$, $\mu(A) = 0$ if $p \notin A$. Pretty boring, but a valid measure.
On the power set of the reals we cannot define a measure that gives all intervals the Lebesgue measure (so $\mu([a,b]) = b-a$ for $a < b$) and also is translation-invariant (so that $\mu(A+x) = \mu(A)$ for all $A$ and all $x \in \mathbb{R}$). We always have measures we can define, but not always "nice" measures with extra good properties.
Best Answer
Formally, measure (resp. probability) theory requires us to works with a triple $(\Omega, \mathcal{F}, P)$ where $\Omega$ is the space we are working on, $\mathcal{F}$ is a $\sigma-$algebra and $P$ is a (probability) measure which maps elements of $\mathcal{F}$ to numbers (between $0$ and $1$). We call the elements of $\mathcal{F}$ the "$\mathcal{F}$ measurable sets". For any non-trivial $\Omega$, you will have many potential $\sigma-$algebras that you can use in the place of $\mathcal{F}$. As you say, one option is to take $\mathcal{F} = 2^\Omega$ (the power set of $\Omega$) to be our $\sigma-$algebra. The problem with this choice is that every subset of $\Omega$ is in the power set--everything is measurable here. Why is that an issue? Among other things, it's often too big for $P$ to have nice properties. In many (most, honestly) cases, finding nice properties we want $P$ to have is what really drives the probability, not the particulars of $\mathcal{F}$.
Stefan gives the standard non-probabilistic example of this in the comments. The Lebesgue measure, which is the natural notion of volume on the real line, is not compatible with the power set as the $\sigma-$algebra in our triple, so we need to pick a new one. The definition of the Borel $\sigma-$algebra is that it is the smallest $\sigma-$algebra containing the open intervals (which had better be measurable if we are going to define volume). Since this $\sigma-$algebra is compatible with the intuitive notion of volume, it is therefore the smallest $\sigma-$algebra we can choose with the property that $\mu\{(a,b)\} = b-a$ for all open intervals $(a,b)$. Why not stop here? Not all subsets of Borel sets of measure $0$ are measurable and it is often nice for the sake of theory to not have to worry about those sets. The Lebesgue $\sigma-$algebra is what you get if you insist that all subsets of sets of measure zero are measurable. In this case, as in many cases, because this $\sigma-$algebra is so natural we often drop the formalism and just say that a set is "measurable" or "not measurable" on the real line, when what we really mean is that it is measurable with respect to the Lebesgue $\sigma-$algebra or not measurable with respect to the Lebesgue $\sigma-$algebra. I believe that the last issue is the source of your confusion. Whatever you were reading dropped that they were referring to the Lebesgue $\sigma-$algebra.
It is not too difficult and not too trivial to construct a set which is Lebesgue measurable but not Borel measurable. In general, most sets you can write down will end up being Borel. By contrast, constructing sets which are not Lebesgue measurable requires using something like the axiom of choice. Analysts are fond of saying that if you can write it down explicitly, it is Lebesgue measurable.
Let me make one quick comment about why probabilists like to use the Borel $\sigma-$algebra rather than the Lebesgue $\sigma-$algebra. For an analyst, the definition of a function being measurable is that the inverse image of open sets is measurable. Since probabilists don't require our spaces to have topologies, this really doesn't work for us. For a probabilist, the definition of a function being measurable is that the inverse image of a measurable set is measurable. The Borel $\sigma-$algebra has the nice property that if you compose two Borel measurable functions, you get another Borel measurable function in either definition. This property fails badly for Lebesgue measurable functions with the analysts' definition of measurable.