This got pretty long but it's the way I think about it.
A $\sigma$-algebra represents information. Formally it's a set of events, but you can think of it as a set of questions you know the answers to.
Conditional expectation is a way of making sense of the idea that if you know some information (represented by a $\sigma$-algebra) you get a new probability distribution conditioned on that information.
So if my $\sigma$-algebra represents a list of questions I want to associate a probability distribution with every possible set of answers. Doing this rigorously presents all sorts of problems and conditional expectation is a formal tool to get around them. To lay out the basic idea I'm going to look as the finite case on more detail.
Suppose you have a probability space $(\Omega, \mathcal F, \mathbb P)$. I can generate a finite $\sigma$-algebra $\mathcal G$ from a finite set of events $(E_1, \dots, E_n)$.
We can interpret this another way, suppose I've chosen an element $\omega\in\Omega$ and you're trying to guess what it is. To make it easier I'm going to let you ask $n$ questions.
You have to choose all $n$ questions before you start asking them in this version of the game.
You've chosen the questions "is $\omega$ in $E_1$?" ... "is $\omega$ in $E_n$?".
When you've asked all your questions you have a conditional probability distribution on $\Omega$. In there are $2^n$ different sets of answers, so there are up to $2^n$ different probability distributions to consider. (some combinations of answers may occur with probability $0$).
So we could have function from the set of answer sequences to the set of probability distributions on $\omega$. But as some combinations of answers may not yield a well defined conditional probability it's better to think of this as a function from $\Omega$ to the set of probability distributions defined $\mathbb P$ almost everywhere. That is every $\omega$ give me a set of answers, so I associate with $\omega$ the probability distribution conditioned on those answers. As there are only finitely many answer sequences, with probability one I get a sequence of answers that occurs with positive probability. So I get a well defined conditional distribution with probability one.
Now let $\ell_1$ be the set of $\mathcal F$-measurable functions.
We can always define a probability distribution in terms of its expectation operator.
Which is a linear functional $\ell_1\to[-\infty,\infty]$.
So if I want to define a useful mathematical object that associates a probability distribution with (almost) every $\omega$ I can define a conditional expectation operator
$$\mathbb E(\circ |\mathcal G)(\circ): \ell_1\times\Omega\to[-\infty,\infty].$$
I can think of this in two ways, either as assigning an expectation operator to every $\omega\in\Omega$ or associating a random variable $\mathbb E(f|\mathcal G):\Omega\to[-\infty,\infty]$ with every $f\in\ell_1$.
As I've only defined my conditional distributions for almost every $\omega$ it's better to use the second idea and think of conditional expectation as a map $\ell_1\to\ell_1$, because random variables only need to be defined almost everywhere.
So to get around the almost everywhere problem we say a well defined function $\ell_1\times\Omega\to[-\infty,\infty]$ is a conditional expectation if it gives the right random variables $\mathbb E(f|\mathcal G):\Omega\to[-\infty,\infty]$
So we have a choice of different versions of conditional expectations, we need a test to check if a given operator gives the right random variables.
In this case we need to find necessary and sufficient conditions for a conditional expectation to agree with the classical case almost everywhere.
Notice three things, firstly for every $f\in\ell_1$ the conditional expectation $\mathbb(f|\mathcal G):\Omega\to[-\infty,\infty]$ must be $\mathcal G$ measurable, because $\mathbb(\circ|\mathcal G)(\omega)$ is an expectation operator associated with a conditional probability distribution which depends on the answers to the $n$ questions.
Secondly you can check that for every function $f$ we must have $\mathbb E\left(\mathbb E(f|\mathcal G)\right) = \mathbb E(f)$.
Thirdly if $g$ is a $\mathcal G$-measurable function then $\mathbb E(fg|\mathcal G)(\omega) = g(\omega)\mathbb E(f|\mathcal G)(\omega)$ because $g$ is almost surely constant on each of the (finitely many) expectation operators $\mathbb E(\circ|\mathcal G)(\omega)$.
You should be able to convince yourself that for finite $\mathcal G$ the classical definition of conditional expectation is the only function that satisfies these three conditions.
As conditional expectation is only defined almost everywhere, the second condition doesn't make sense. But as we have completely free choice of $\mathcal G$-measurable $g$ we can combine the last two conditions to get
$$\mathbb E\left( \mathbb E(fg|\mathcal g)\right) = \mathbb E\left(\mathbb E(f|\mathcal G)g\right).$$
Again convince yourself that anything satisfying this must agree with the classical conditional expectation almost everywhere.
So for a finite $\sigma$-algebra the classical conditional expectation
can be described as the only $\mathcal G$-measurable function that satisfies the condition above.
For the finite case this is all a bit unnecessary, but it works. If I define a conditional expectation operator $\mathbb E(\circ |\mathcal G)(\circ): \ell_1\times\Omega\to[-\infty,\infty]$ this will give me an expectation operator
and hence a probability distribution for almost every $\omega$.
Furthermore that conditional probability distribution will agree with
the classical one for almost every choice of $\omega$.
Suppose instead of $n$ questions I allowed you to ask a countably infinite list of questions. Now we want to do the same thing, but now the set of possible answers is uncountable. So it's quite possible that every set of answers will have probability $0$ and I can't use the normal definition of conditional expectation.
But what I want to achieve is the same thing. For every set of answers I want a conditional distribution on my probability space given those answers.
The ideas above still work with infinite sigma algebras, but you need to mess about with Radon-Nykodym derivatives to prove it and I assume you're familiar with that.
But it turns out there always exists a conditional expectation operator that satisfies the two conditions.
So, although formally we describe conditional expectation as a random variable associated with each $\mathcal F$ measurable function, anything that satisfies the conditions gives me a probability distribution for almost every $\omega$. I can interpret that distribution as the conditional distribution given that I "know" $\mathcal G$.
Best Answer
Your interpretation of knowing whether $A$ happened or not for all $A \in \mathcal A$ is essentially correct. However, things get a little tricky in that interpretation because $\mathcal A$ is often augmented by the null sets of $\mathcal F$. For example, if we are looking at a filtration $(\mathcal F_t)$ generated by a Brownian motion $W$, we often augment $\mathcal F_t$ by the null sets of $\mathcal F_\infty$. This means that events like $\{W_2 = x\} \in \mathcal F_1$ for all $x \in \mathbb{R}$. If we wanted to interpret $\mathbb{E}[W_2|\mathcal F_1]$ as knowing whether or not $A$ happened for all $A \in \mathcal A$, this would make it seem like we should know the value of $W_2$, since we know whether or not $W_2 = x$ for all $x \in \mathbb{R}$.
Instead, I would interpret conditioning on $\mathcal A$ as being able to ask whether or not $A$ occurred for all $A \in \mathcal A$. Since $\mathcal A$ is closed under countable unions, we can ask about countably many events. In the example about the Brownian motion, this makes it so that while we can ask whether $W_2 = x$ for any $x \in \mathbb{R}$, there's no point because we already know that (with probability one) the answer will be "no."