About the update
Your update makes all sense, because if B could know what state system A is by just measuring his system very often is related to the problem of to distinguish quantum states:
"Alice chooses a state $\left | \psi_i\right>$ from some fixed set of states $\{\left|\psi_j\right>|1\leq j\leq n\}$ known to both parties. She gives the state to Bob, whose task it is to identify the index $i$ of the state Alice has given him." -Nielsen and Chuang, pg. 86. 10th anniversary edition.
Bob can reliably distinguish the states if and only if they are orthogonal. If they are not, the best he can do is to identify what not is the state
"However, it is possible for him to perform a measurement which
distinguishes the states some of the time, but never makes an error of mis-identification" - Nielsen and Chuang, pg. 92. 10th anniversary edition.
Look that this is true regard how many times Bob measures A, i.e., how many copies Alice send to him, because we can't say she sends the same state every time (remember the partial state of Alice is a misture of two non-orthogonal states). If Bob could build a clone machine, it would be possible to him to do local tomography and reliably distinguish the states of Alice, but this is not possible. By converse, if he could distinguish the non-orthogonal states of Alice, he can build a clone machine (Nielsen and Chuang, pg. 531. 10th anniversary edition.).
We are in a little modified version of the distinguishing problem, where the system of Bob is entangled with the Alice's system, but nothing changes, since these results are general. Schlosshauer probably refers to orthogonal states; if not, he is missing the no-cloning theorem.
About the questions
When people talk about pure bipartite states, often they are considering the Schmidt decomposition. So given a state $\left|\Psi \right>$, let's consider
$$
\left|\Psi \right> = \sum_j \lambda_j \left|\phi_j^A \right>\left|\phi_j^B \right>, \quad \lambda_j \geq 0.
$$
Now, if we consider the correlation function for the state in this form, we get
$$
K = \sum_{j,k} \lambda_j \lambda_k \left < \phi_j^A \right| O_A \left |\phi_k^A\right> \left < \phi_j^B \right| O_B \left |\phi_k^B\right> - \left(\sum_j \lambda_j^2 \left < \phi_j^A \right| O_A \left |\phi_j^A\right> \right ) \left (\sum_j \lambda_j^2 \left < \phi_k^B \right| O_B \left |\phi_j^B\right> \right).
$$
Now, it comes the answer of your first question: the state is a product state if and only if just one $\lambda_j$, say $\lambda_1$, is different from zero. But if it happens, (by normalization $\lambda_1 = 1$), then
$$
K = \left < \phi_1^A \right| O_A \left |\phi_1^A\right> \left < \phi_1^B \right| O_B \left |\phi_1^B\right> - \left < \phi_1^A \right| O_A \left |\phi_1^A\right> \left < \phi_1^B \right| O_B \left |\phi_1^B\right> =0 , \quad \forall O_A,O_B.
$$
So
If the state $\left | \Psi \right>$ is a product state, then $K=0$ for every $O_A,O_B$.
and we can say that
If the state $\left | \Psi \right>$ is entangled, then $K\neq0$ for some $O_A,O_B$.
Now, to learn what happens to entangled states, let's consider first a special kind of entangled states. If we say $\lambda_1 = \sqrt{1-\epsilon}$, $\lambda_2 = \sqrt{\epsilon}$ and $\lambda_j = 0, j>2$, then
$$
\left| \Psi \right> = \sqrt{1-\epsilon} \left|\phi_1^A \right>\left|\phi_1^B \right> + \sqrt{\epsilon} \left|\phi_2^A \right>\left|\phi_2^B \right>, \quad \epsilon \in [0,1]
$$
For small $\epsilon$, we can say this is state is "almost product". The correlation function now will be
$$
K = \epsilon (1-\epsilon) (O_A^{11}+O_A^{22})(O_B^{11}+O_B^{22}) + \sqrt{\epsilon(1-\epsilon)}(O_A^{12}O_B^{12}+O_A^{21}O_B^{21})
$$
where
$O_A^{ij} = \left < \phi_i^A \right| O_A \left |\phi_j^A\right>$ and the same for Bob observable. Now we see that $K$ depends on $\epsilon$, that is our entanglement parameter, but also depends if $O_A$ and $O_B$ have support over the spaces spanned by $\{\left |\phi_1^A\right>,\left |\phi_2^A\right> \}$ and $\{\left |\phi_1^B\right>,\left |\phi_2^B\right> \}$. For example, if $O_A = \left | \phi_3^A\right >\left < \phi_3^A\right |$ and $O_B = \left | \phi_3^B\right >\left < \phi_3^B\right |$, clearly $K=0$ despite $\left|\Psi\right>$ being entangled or not. Now it comes the answer of your second question: Yes, we can use the correlation function to quantify entanglement but, as you noticed, we must consider all observables $O_A,O_B$, or at least take the optimal ones.
By the expression we obtained for $K$, it's not hard to search for optimal observables. You probably noticed that we must consider observables with support in the spaces spanned by $\{\left |\phi_1^A\right>,\left |\phi_2^A\right> \}$ and $\{\left |\phi_1^B\right>,\left |\phi_2^B\right> \}$. We can consider $O_A = \left |\phi_1^A\right>\left <\phi_2^A\right| + \left |\phi_2^A\right>\left <\phi_1^A\right|$ and $O_B = \left |\phi_1^B\right>\left <\phi_2^B\right| + \left |\phi_2^B\right>\left <\phi_1^B\right|$ so we get
$$
K = \sqrt{\epsilon(1-\epsilon)}
$$
which is the max. value for each $\epsilon$, since $\sqrt{\epsilon(1-\epsilon)} \geq \epsilon(1-\epsilon)$ for $\epsilon \in [0,1]$. It's not hard to check that $K$ is maximal if and only if $\epsilon = 1/2$, which is the greatest entanglement we can have for the kind of states we considered (effectively two qubit states).
General pure bipartite states
The general case appears to be not so hard to deal. We can consider
$$
O_A = \sum_{j\neq k} \left|\phi_j^A\right>\left<\phi_k^A\right|
$$
and the same for B, so we get
$$
K = \sum_{j\neq k} \lambda_j \lambda_k = \left (\sum_j \lambda_j \right)^2 - 1
$$
So to maximize $K$, we must maximize the function inside the parenthesis. By Lagrange multiplier method, we must consider the function
$$
\mathcal L = \sum_j \lambda_j + \mu \left (\sum_j \lambda_j^2 - 1\right)
$$
$$
\frac{\partial \mathcal L}{\partial \lambda_j} = 1-2\mu \lambda_j = 0
$$
$$
\lambda_j = \frac{1}{2\mu}, \forall j.
$$
If the state $\left | \Psi \right >$ has $r$ non-zero Schmidt coefficients, then
$$
\sum_j \lambda_j^2 = \frac{1}{4\mu^2}\sum_j 1 = \frac{r}{4\mu^2} = 1
$$
which implies that
$$
\mu = \frac{\sqrt{r}}{2} \rightarrow \lambda_j = \frac{1}{\sqrt{r}}.
$$
in the special case where the dimension of Hilbert spaces of A and B are equal $d_A=d_B =d$ and the shared state $\left | \Psi \right>$ has $d$ non-zero Schmidt coefficients, we get
$$
\lambda_j = \frac{1}{\sqrt{d}} \rightarrow \left | \Psi \right> = \frac{1}{\sqrt{d}} \sum_{j=1}^d \left|\phi_j^A\right> \left| \phi_j^B\right>
$$
which is the maximal entangled state. So
Considering the optimal observables $O_A,O_B$, the correlation function $K$ is maximal if and only if $\left |\Psi \right>$ is the maximal entangled state.
In summary, the correlation function could be a good entanglement quantifier for pure bipartite states if we consider optimal observables $O_A,O_B$. If we not consider the optimal ones, we can be tricky and find observables without correlation for a given entangled state.
Best Answer
First, the definition of entanglement.
Let $\mathcal{H} \cong \mathcal{H}_1 \otimes \mathcal{H}_2$ be your composite Hilbert space (of two particles). A state $\lvert \psi \rangle \in \mathcal{H}$ is entangled precisely when $\lvert \psi \rangle \neq \lvert \phi \rangle_1 \otimes \lvert \varphi \rangle_2$ for $\lvert \phi \rangle_1 \in \mathcal{H}_1$ and $\lvert \varphi \rangle_2 \in \mathcal{H}_2$.
It is important to note that this definition of being entangled is utterly independent of the chosen basis for you Hilbert space. Rather, whether two systems are entangled depends on the tensor product factorization of the composite Hilbert space.
This seems fine if you are looking to translate the mathematical definition above into words. Though, I think it would be more precise to say "A and B are entangled if the measurement outcomes of A are quantum correlated with the measurement outcomes of B". Where quantum correlation is (tautologically) defined as the correlations allowed in the theory of quantum mechanics.
However, all this discussion has been about whether a bipartite state has the quality of being entangled or not. Nothing about quantitatively how entangled two systems represented by a state are. A common (I think) measure for how much a pure state is entangled is given by the von Neumann entropy (also called entanglement entropy, but entanglement entropy broadly refers to the measure being used to quantify entanglement). Let $\rho$ be a density matrix representing a pure state of a two particle system. Then, $\rho_A = \text{Tr}_B(\rho)$ is the reduced density matrix for system A. Then, the von Neumann entropy for $\rho_A$ is
$$S(\rho_A) = -\text{Tr}_A(\rho_A\log_2\rho_A)$$
which gives somewhat of a measure of how entangled the two systems represented by the state $\rho$ are. For why, cf. slide 8 and 9 here.
I think this von Neumann entropy measure of entanglement is what Schlosshauer has in mind (cf. section 2.4.3 in the Decoherence text you're working through). The von Neumann entropy is basis independent as can be verified. Hence, this measurement of how entangled two systems represented by a state are is basis independent.