This question uses many names in a nonstandard way (at least as far as the
coding theory literature is English is concerned), and this confuses the issues.
The codes in question are single-error-correcting codes but are not
Hamming codes in general. By Hamming code, one usually means a $[2^m-1,2^m-1-m]$
linear code with minimum distance $3$. It is a single-error-correcting
code, but not all single-error-correcting codes are Hamming codes. Some might
be shortened Hamming codes (set some data bits to $0$ and don't transmit
them at all) but even this is not strictly necessary.
So, what are these "rectangular" codes? They are punctured product codes,
where by a product code $\mathcal C_1\times \mathcal C_2$ is meant a
rectangular $k_2\times k_1$ array of data bits bordered by parity bits
such that each row is a codeword in $\mathcal C_1$ and each column is
a codeword in $\mathcal C_2$ where, of course, $\dim(\mathcal C_i) = k_i$.
The minimum distance of $\mathcal C_1\times \mathcal C_2$ is the product
of the minimum distances of $\mathcal C_1$ and $\mathcal C_1$.
If $\mathcal C_1$ and $\mathcal C_2$ are $[k_i+1,k_i]$ single parity check
codes, then the minimum distance of $\mathcal C_1\times \mathcal C_2$ is $4$.
But, the codes used by the OP are punctured product codes since the
lower right-hand corner
of the array (the parity of the parity bits!) has been deleted, and the
minimum distance reduces by $1$.
Codewords in a product code can be regarded as a single vector of
length $n_1n_2$ (just concatenate the rows into a single codeword)
and so $\mathcal C_1\times \mathcal C_2$ is a $[n_1n_2,k_1k_2, d_1d_2]$
linear code. The codes called Hamming codes by the OP's instructor
are thus $[n_1n_2-1,k_1k_2, 3]$ single-error-correcting codes, but
they are not Hamming codes. Note for example that with $k_1=k_2=2$,
the construction gives a $[8,4,3]$ code (punctured from a $[9,4,4]$
single-error-correcting, double-error-detecting code),
and this is not the same as
the $[7,4,3]$ Hamming code that all of us know and love.
Of course, error correction is easy with this code. If
one row parity-check sum fails as does one column parity-check sum,
the error is in the data bit in that row and column. If only a
row sum fails or only a column sum fails, the parity bit in the
failed sum was received incorrectly.
-------
Sheesh! OK, $$\begin{align}c_1&=u_1\\c_2&=u_2\\c_3&=u_1+u_2\\c_4&=u_3\\c_5&=u_4\\c_6&=u_3+u_4\\c_7&=u_1+u_3\\c_8&=u_2+u_4\end{align}$$ giving $$\left[\begin{matrix}c_1\\c_2\\c_3\\c_4\\c_5\\c_6\\c_7\\c_8\end{matrix}\right]=\left[\begin{matrix}1&0&0&0\\0&1&0&0\\1&1&0&0\\0&0&1&0\\0&0&0&1\\1&0&1&0\\0&1&0&1\end{matrix}\right]\left[\begin{matrix}u_1\\u_2\\u_2\\u_4\end{matrix}\right]$$ Can you figure out $\mathbf G$ now? It will not be of the form $[I\mid P]$.
Let $m>n$, and $\{v_1,\ldots,v_{n+1}\}$ be random vectors in $\mathbb{F}^m_2$.
Call $p_n$ the probability that $\{v_1,\ldots,v_n\}$ are linearly independent.
In that case, $\{v_1,\ldots,v_{n+1}\}$ are independent (event $A_n$) if and only if $\{v_1,\ldots, v_n\}$ are independent and $v_{n+1} \notin \left<v_1,\ldots,v_n\right>$ (event $B_n$).
$$p_{n+1} = P(A_n \cap B_n) = P(A_n)P(B_n|A_n) = (1-2^{n-m})p_n$$
$p_1 = (1-2^{-m})$, since a single vector is independent if and only if it's not $0$, so:
$$p_n = \prod_{i=1}^n (1-2^{i-1-m})$$
Best Answer
As an example showing, why the answer depends on $P$, consider the matrices ($n=4$, $k=u=2$) $$ G_1=\pmatrix{1&0&1&1\cr0&1&0&0\cr},\quad G_2=\pmatrix{1&0&1&0\cr0&1&0&1},\quad G_3=\pmatrix{1&0&1&1\cr0&1&0&1}. $$ Two columns of $G_1$ will be linearly independent, iff the second column is among them. There the probability of a full rank submatrix is $3/6=1/2$. OTOH two columns of $G_2$ will be linearly independent unless you happened to pick 1st and 3rd or 2nd and 4th, so the probability of full rank is $4/6=2/3$. As the last case, two columns of $G_3$ will be linearly independent iff they are distinct, so the only pair violating that condition is 1st and 3rd. Therefore the probability of full rank has gone up to $5/6$.
It is easy to prove that with this set of parameters: $N=4, k=u=2$ it is impossible that the probability of a full rank submatrix would be equal to $1.$ Probability of full rank can be $1$ only, if the minimum Hamming distance of the dual code is $>u$ (and this is not possible for many choices of $n,k,u$). The way to see this is to view $G$ as a parity check matrix. If some $u$ columns are linearly dependent, then some subset of those columns sums to zero. Putting 1s in the positions of that subset we get a non-zero word of the dual code that has weight $\le u$. If the minimum Hamming distance of the dual code is $>u$, such sets do not exist, and the probability of full rank is equal to $1$.
So it all comes down to finding the minimum Hamming distance of the dual code.