For instance, if $a,b>0$ and $\displaystyle A = \begin{bmatrix}a & 0 \\ 0 & b\end{bmatrix}$, how would one make this congruence transformation to get a basis in which $A=I$?
In this case you can take $S$ to be $\begin{bmatrix}\frac 1{\sqrt{a}} & 0 \\ 0 &\frac 1{\sqrt{b}}\end{bmatrix}$, or in other words taking $\mathbf e'_1=\frac 1{\sqrt{a}}\mathbf e_1$ and $\mathbf e'_2=\frac 1{\sqrt{b}}\mathbf e_2$.
In general the question we want to answer is: Given a basis and an inner product, find a new basis which is orthonormal for that product. This is done by the Gram–Schmidt process.
EDIT: The question has been updated slightly to ask this question:
This is, can we diagonalise simultaneously two quadratic forms in such a way that one of them is seen from a new basis as I, and the other one as a diagonal matrix whose entries are the proper eigenvalues?
The answer is yes. Suppose our real symmetric matrices are $A$ and $B$. As above, find an orthonormal basis so that $A\mapsto I$. Then if we change basis again by an orthogonal matrix (one such that $O^T=O^{-1}$) we will find that the matrix for $A$ is still $I$ (because it becomes $OIO^T=OIO^{-1}=I$. So we are done by the fact that real symmetric matrices can be diagonalised by orthogonal matrices.
(Answering own question)
Here we are given that
$A: W \to V$ and $Y: V \to U$, and the composition is then $A \circ Y : W \to U$.
From an earlier part (of the paper, the section on the left side of page 10), we see that
$$\textrm{dim(Im}(A)) = \textrm{dim(Im}(YA)) + \textrm{dim(Ker(}Y) \cap \textrm{Im}(A))$$
Look at the last term; it tells us that vectors that are in Ker$(Y)$ that are also in $Im(A)$ are of interest. Now look at middle term; it tells us that vectors that are in $Im(A)$ and then in $Im(Y)$ are of interest. This gives us all of $Im(A)$. Hence the equality of dimensions.
Now taking a similar approach for the basis question, we can first see that because row space and image are interchangeable (correct me if I'm wrong), that Ker($YA)^{\perp} = $RowSpace$(YA) =$ Im$(YA)$. So now we just need to show that Ker$(YA)/$Ker$(A) =$ Ker$(Y)$ $\cap$ Im$(A)$. Observe that Ker($YA)$ will have two kinds of vectors: they will be (1) those from Ker($A$), because these map to the zero vector in $V$ space, so are taken to zero vector in $U$ space via linear map $Y$, and (2) those from Ker($Y$), because they map to the zero vector in $U$ by definition. The quotient vector space Ker($YA)$/Ker($A$) has coset elements, treating elements in Ker($A)$ together as the identity coset.
Ker($Y) \cap$ Im$(A)$ is pretty much Ker($Y$). Rank-nullity theorem tells us there are dim$(V)$ - dim(Im($A$)) vectors in Ker$(A$). So then we know that there are dim(Ker($YA$))/(dim$(V)$-dim(Im($A$))) elements in the quotient space, by Lagrange's theorem. This tells us that, for example, if dim(Ker($A)) = 1$ (zero vector only), then each coset has one element (because Im$(A$) spans whole vector space $V$, everything in Ker($Y$) is in Im$(A$)). If there is more than one vector in the kernel of $A$, say $n$, then the image doesn't span whole vector space, so the cosets have $n$ elements each (equality of coset sizes is another group theory theorem). Regardless, it tells how many vectors of Ker$(Y)$ are in the span of Im$(A)$, and gives these as representatives (to us as basis vectors). So Ker$(YA)/$Ker$(A)$ tells us the vectors of Ker$(Y)$ that are in the span of Im$(A)$. So Ker$(YA)/$Ker$(A) \subseteq $Ker$(Y) \cap $Im$(A)$. The intersection Ker$(Y)$ $\cap$ Im$(A)$ basically tells us the same thing, backwards: what is the region of vector space that satisfies the intersection? Then when we find an orthonormal basis for that region, we will have found what would be coset representatives of Ker$(YA)/$Ker$(A)$ (I know that was very hand-wavy, but it works for me at the moment). So we can see how Ker$(Y) \cap $Im$(A) \subseteq $Ker$(YA)/$Im$(A)$, and thus Ker$(YA) \cap $Im$(A) = $Ker$(YA)/$Im$(A)$. So we have that the bases of Ker$(YA)^{\perp}$ and Ker$(YA)$/Ker($A$) together give a basis for ker($A)^{\perp} = $Im$(A)$.
For part two:
Note that given matrix $M$, and a complete orthonormal basis $e_i$ ($e_i$ will be a column vector, and $e_i^T$ a row vector), we can put $M$ into a new basis by multiplying it by the sum of the dyads of $e_i$:
$$M = (M\cdot e_1)e_1^T + (M\cdot e_2)e_2^T + ... + (M\cdot e_n)e_n^T = \sum_{i=1}^n(M\cdot e_i)e_i^T,$$
where $n$ is the dimension of the vector space of interest (correct me if I am wrong about that part). This should look familiar from the question. Now observe that if we are trying to find a complete orthonormal basis for $M$, one way to do so could be using the basis of Ker$(M)$ and Ker$(M)^{\perp} = $Im$(M)$. Let $f_i$ be basis vectors of Ker$(M)$ and $e_i$ be basis vectors of Im$(M)$, then
$$M = \sum_{i=1}^{\textrm{dim(Im}(M))}(M\cdot e_i)e_i^T + \sum_{j=1}^{\textrm{dim(Ker}(M))}(M\cdot f_j)f_j^T.$$
Notice though that for all $v \in \textrm{Ker}(M), M\cdot v = \textbf{0}$, so
$$M = \sum_{i=1}^{\textrm{dim(Im}(M))}(M\cdot e_i)e_i^T + \sum_{j=1}^{\textrm{dim(Ker}(M))}(M\cdot f_j)f_j^T = \sum_{i=1}^{\textrm{dim(Im}(M))}(M\cdot e_i)e_i^T + \sum_{j=1}^{\textrm{dim(Ker}(M))}(\textbf{0})f_j^T = \sum_{i=1}^{\textrm{dim(Im}(M))}(M\cdot e_i)e_i^T,$$
and
$$M = \sum_{i=1}^{\textrm{dim(Im}(M))}(M\cdot e_i)e_i^T = M\sum_{i=1}^{\textrm{dim(Im}(M))}e_ie_i^T.$$
Thus, if $\left\{e_{\alpha},\tilde{e}_{\beta}\right\}$ form a basis for Im$(M)$, then
$$M = M\left\{\sum_{\alpha=1}^se_{\alpha}e_{\alpha}^T + \sum_{\beta=1}^{\delta}\tilde{e}_{\beta}\tilde{e}_{\beta}^T\right\}.$$
Edit: I think this is a pretty cool result! Generally, given a matrix $M$ (perhaps has to be square?) if $\left\{e_i : 1 \leq i \leq \textrm{dim(Im}(M)\right\}$ is an orthonormal basis for Im$(M)$, then $M = M\sum_ie_ie_i^T$.
Best Answer
First note that $ker(S)$ is a linear subspace of $\mathbb{R}^m$ (I am assuming $S$ is an $n \times m$ matrix). Next observe that $\mathbb{R}_+^m$ is a polyhedral cone (a cone is polyhedral if it can be expressed as $Ax \leq 0$). The intersection of a linear subspace and the cone $\mathbb{R}_+^m$ will be another polyhedral cone. It is worth trying to visualize this. Consider $\mathbb{R}^3$. Then the non-negative orthant $\{(x,y,z) \mid x\geq 0, y\geq0, z\geq0\}$ is a polyhedral cone. Another description of this cone is that it is the convex hull of the following three extreme rays $\{t(1,0,0) \mid t \geq 0\}$, $\{t(0,1,0) \mid t \geq 0\}$, and $\{t(0,0,1) \mid t \geq 0\}$. Now you should be able to convince yourself that the intersection of the non-negative orthant with any hyperplane passing through the origin will result in another polyhedral cone.
Polyhedral cones have a finite number of extreme rays. There are various methods to enumerate these extreme rays. You can truncate the cone and enumerate the extreme points of the resulting polytope; this seems to be the approach in the link you included. There is also very good software that will explicitly enumerate the extreme rays: PORTA and cdd/cdd+ are two that come to mind.