Whilst I understand the term conceptually, I'm struggling to understand it operationally. Could anyone help me out by providing an example?
Hypothesis Space – Definition and Explanation in Machine Learning
definitionmachine learningterminology
Related Solutions
You are absolutely correct in observing that even though $\mathbf{u}$ (one of the eigenvectors of the covariance matrix, e.g. the first one) and $\mathbf{X}\mathbf{u}$ (projection of the data onto the 1-dimensional subspace spanned by $\mathbf{u}$) are two different things, both of them are often called "principal component", sometimes even in the same text.
In most cases it is clear from the context what exactly is meant. In some rare cases, however, it can indeed be quite confusing, e.g. when some related techniques (such as sparse PCA or CCA) are discussed, where different directions $\mathbf{u}_i$ do not have to be orthogonal. In this case a statement like "components are orthogonal" has very different meanings depending on whether it refers to axes or projections.
I would advocate calling $\mathbf{u}$ a "principal axis" or a "principal direction", and $\mathbf{X}\mathbf{u}$ a "principal component".
I have also seen $\mathbf u$ called "principal component vector".
I should mention that the alternative convention is to call $\mathbf u$ "principal component" and $\mathbf{Xu}$ "principal component scores".
Summary of the two conventions:
$$\begin{array}{c|c|c} & \text{Convention 1} & \text{Convention 2} \\ \hline \mathbf u & \begin{cases}\text{principal axis}\\ \text{principal direction}\\ \text{principal component vector}\end{cases} & \text{principal component} \\ \mathbf{Xu} & \text{principal component} & \text{principal component scores} \end{array}$$
Note: Only eigenvectors of the covariance matrix corresponding to non-zero eigenvalues can be called principal directions/components. If the covariance matrix is low rank, it will have one or more zero eigenvalues; corresponding eigenvectors (and corresponding projections that are constant zero) should not be called principal directions/components. See some discussion in my answer here.
The problem with this kind of definition is that it is ambiguous and can be understood differently by different people and in different contexts. Wikipedia says that,
heuristic, is any approach to problem-solving, learning, or discovery that employs a practical method not guaranteed to be optimal or perfect, but sufficient for the immediate goals.
How do you know that the solution is optimal or perfect? When you are dealing with random phenomena, then you cannot get "perfect" results (i.e., always correct). What machine learning algorithms give you, is the "best you can get" results, given certain conditions are met. Moreover, each of the algorithms that are commonly used gives you some guarantees for optimality in certain scenarios (if they didn't, we wouldn't use them).
Heuristics have very similar, though more precise, meaning in computer science, tl;dr: they are algorithms that seek an approximate, opinionated solution rather than the exact one. In machine learning, there is usually no exact solutions, so it is not achievable by any algorithm.
Best Answer
Lets say you have an unknown target function $f:X \rightarrow Y$ that you are trying to capture by learning. In order to capture the target function you have to come up with some hypotheses, or you may call it candidate models denoted by H $h_1,...,h_n$ where $h \in H$. Here, $H$ as the set of all candidate models is called hypothesis class or hypothesis space or hypothesis set.
For more information browse Abu-Mostafa's presentaton slides: https://work.caltech.edu/textbook.html