(Answering own question)
Here we are given that
$A: W \to V$ and $Y: V \to U$, and the composition is then $A \circ Y : W \to U$.
From an earlier part (of the paper, the section on the left side of page 10), we see that
$$\textrm{dim(Im}(A)) = \textrm{dim(Im}(YA)) + \textrm{dim(Ker(}Y) \cap \textrm{Im}(A))$$
Look at the last term; it tells us that vectors that are in Ker$(Y)$ that are also in $Im(A)$ are of interest. Now look at middle term; it tells us that vectors that are in $Im(A)$ and then in $Im(Y)$ are of interest. This gives us all of $Im(A)$. Hence the equality of dimensions.
Now taking a similar approach for the basis question, we can first see that because row space and image are interchangeable (correct me if I'm wrong), that Ker($YA)^{\perp} = $RowSpace$(YA) =$ Im$(YA)$. So now we just need to show that Ker$(YA)/$Ker$(A) =$ Ker$(Y)$ $\cap$ Im$(A)$. Observe that Ker($YA)$ will have two kinds of vectors: they will be (1) those from Ker($A$), because these map to the zero vector in $V$ space, so are taken to zero vector in $U$ space via linear map $Y$, and (2) those from Ker($Y$), because they map to the zero vector in $U$ by definition. The quotient vector space Ker($YA)$/Ker($A$) has coset elements, treating elements in Ker($A)$ together as the identity coset.
Ker($Y) \cap$ Im$(A)$ is pretty much Ker($Y$). Rank-nullity theorem tells us there are dim$(V)$ - dim(Im($A$)) vectors in Ker$(A$). So then we know that there are dim(Ker($YA$))/(dim$(V)$-dim(Im($A$))) elements in the quotient space, by Lagrange's theorem. This tells us that, for example, if dim(Ker($A)) = 1$ (zero vector only), then each coset has one element (because Im$(A$) spans whole vector space $V$, everything in Ker($Y$) is in Im$(A$)). If there is more than one vector in the kernel of $A$, say $n$, then the image doesn't span whole vector space, so the cosets have $n$ elements each (equality of coset sizes is another group theory theorem). Regardless, it tells how many vectors of Ker$(Y)$ are in the span of Im$(A)$, and gives these as representatives (to us as basis vectors). So Ker$(YA)/$Ker$(A)$ tells us the vectors of Ker$(Y)$ that are in the span of Im$(A)$. So Ker$(YA)/$Ker$(A) \subseteq $Ker$(Y) \cap $Im$(A)$. The intersection Ker$(Y)$ $\cap$ Im$(A)$ basically tells us the same thing, backwards: what is the region of vector space that satisfies the intersection? Then when we find an orthonormal basis for that region, we will have found what would be coset representatives of Ker$(YA)/$Ker$(A)$ (I know that was very hand-wavy, but it works for me at the moment). So we can see how Ker$(Y) \cap $Im$(A) \subseteq $Ker$(YA)/$Im$(A)$, and thus Ker$(YA) \cap $Im$(A) = $Ker$(YA)/$Im$(A)$. So we have that the bases of Ker$(YA)^{\perp}$ and Ker$(YA)$/Ker($A$) together give a basis for ker($A)^{\perp} = $Im$(A)$.
For part two:
Note that given matrix $M$, and a complete orthonormal basis $e_i$ ($e_i$ will be a column vector, and $e_i^T$ a row vector), we can put $M$ into a new basis by multiplying it by the sum of the dyads of $e_i$:
$$M = (M\cdot e_1)e_1^T + (M\cdot e_2)e_2^T + ... + (M\cdot e_n)e_n^T = \sum_{i=1}^n(M\cdot e_i)e_i^T,$$
where $n$ is the dimension of the vector space of interest (correct me if I am wrong about that part). This should look familiar from the question. Now observe that if we are trying to find a complete orthonormal basis for $M$, one way to do so could be using the basis of Ker$(M)$ and Ker$(M)^{\perp} = $Im$(M)$. Let $f_i$ be basis vectors of Ker$(M)$ and $e_i$ be basis vectors of Im$(M)$, then
$$M = \sum_{i=1}^{\textrm{dim(Im}(M))}(M\cdot e_i)e_i^T + \sum_{j=1}^{\textrm{dim(Ker}(M))}(M\cdot f_j)f_j^T.$$
Notice though that for all $v \in \textrm{Ker}(M), M\cdot v = \textbf{0}$, so
$$M = \sum_{i=1}^{\textrm{dim(Im}(M))}(M\cdot e_i)e_i^T + \sum_{j=1}^{\textrm{dim(Ker}(M))}(M\cdot f_j)f_j^T = \sum_{i=1}^{\textrm{dim(Im}(M))}(M\cdot e_i)e_i^T + \sum_{j=1}^{\textrm{dim(Ker}(M))}(\textbf{0})f_j^T = \sum_{i=1}^{\textrm{dim(Im}(M))}(M\cdot e_i)e_i^T,$$
and
$$M = \sum_{i=1}^{\textrm{dim(Im}(M))}(M\cdot e_i)e_i^T = M\sum_{i=1}^{\textrm{dim(Im}(M))}e_ie_i^T.$$
Thus, if $\left\{e_{\alpha},\tilde{e}_{\beta}\right\}$ form a basis for Im$(M)$, then
$$M = M\left\{\sum_{\alpha=1}^se_{\alpha}e_{\alpha}^T + \sum_{\beta=1}^{\delta}\tilde{e}_{\beta}\tilde{e}_{\beta}^T\right\}.$$
Edit: I think this is a pretty cool result! Generally, given a matrix $M$ (perhaps has to be square?) if $\left\{e_i : 1 \leq i \leq \textrm{dim(Im}(M)\right\}$ is an orthonormal basis for Im$(M)$, then $M = M\sum_ie_ie_i^T$.
Best Answer
If $N(\alpha) = N(\beta)$, there are two possibilities: either $N(\alpha) = N(\beta) = V$, in which case $R(\alpha) = R(\beta) = \{0\}$, that is to say, both $\alpha$ and $\beta$ are the null functionals from $V$ to $\textbf{F}$; or $N(\alpha) = N(\beta)$ and they have dimension $n-1$. Let, for instance, $\{v_{1},v_{2},\ldots,v_{n-1}\}$ be a basis for $N(\alpha)$. Then we can extend it to a basis of $V$.
Let $\mathcal{B} = \{v_{1},v_{2},\ldots,v_{n}\}$ be a basis for $V$ and $\mathcal{B}^{*} = \{f_{1},f_{2},\ldots,f_{n}\}$ be the basis for $V^{*}$ where $f_{i}(v_{j}) = \delta_{ij}$. Therefore $\alpha = \alpha(v_{1})f_{1} + \alpha(v_{2})f_{2} + \ldots + \alpha(v_{n})f_{n}$ and $\beta = \beta(v_{1})f_{1} + \beta(v_{2})f_{2} + \ldots + \beta(v_{n})f_{n}$. Hence we have that $\alpha(v) = \alpha(v_{n})f_{n}$ and $\beta(v) = \beta(v_{n})f_{n}$, where $\alpha(v_{n})\beta(v_{n})\neq 0$, because $\alpha(v_{n})$ spans $R(\alpha)$ and $\beta(v_{n})$ spans $R(\beta)$ and $\dim R(\alpha) = \dim R(\beta) = 1$.
Based on such considerations, we conclude that $\alpha = \lambda \beta$, where $\lambda\in\textbf{F}$ in the first case and $\lambda = \beta(v_{n})/\alpha(v_{n})$ in the second case. Such considerations result from the application of the following theorem (valid when $V$ and $W$ are finite dimensional): \begin{align*} \dim V = \dim N(T) + \dim R(T) \end{align*}
where $T$ is a linear mapping from the linear space $V$ to the linear space $W$. Hope this helps.