As I interpret your book's definition of an "information set", the "set of coordinates" that corresponds to a set of columns is literally the index of those columns in the matrix $G$. So if $$G=\left[\begin{array}{cccc|ccc}
1 & 0 & 0 & 0 & 0 & 1 & 1 \\
0 & 1 & 0 & 0 & 1 & 0 & 1 \\
0 & 0 & 1 & 0 & 1 & 1 & 0 \\
0 & 0 & 0 & 1 & 1 & 1 & 1 \end{array}\right],$$
the "set of coordinates" corresponding to "the first, second, fourth, and seventh columns" is literally just (1,2,4,7). Since these columns are not linearly independent, this would be an example of a set of four coordinates which do not form an information set.
That $G$ has four independent columns is what allows it to uniquely encode a message of length four. If you take any four-digit message $m$ and multiply $mG$ then the resulting 7-digit codeword can be translated uniquely back to the original message; if $G$ had fewer than $4$ independent columns, this multiplication would yield the same result for more than one 4-digit message, and in a sense you would lose information. You could never be sure you decoded certain messages correctly.
Another way to think of the situation is that because $G$ has four independent columns, it has rank 4 and thus when you put it in row-reduced echelon form, it will be a standard generator matrix (the $G$ in this example already is a standard generator matrix). If it wasn't, though, and yet it still had four independent columns, you could convert it easily. Now, if you look at what happens when you multiply $mG$, the result is the message $m$ itself with three additional digits tacked onto the end. If the columns of the identity matrix were at other coordinates, the codeword would have the message $m$ rearranged as those columns are rearranged, with additional digits inserted where the other columns are.
I hope this rambling was useful rather than even-more confusing. This document has a pretty good explanation of information sets as well (see p. 39). (I'm not sure who to credit, but the document originally came from Jonathan Hall's MSU webpage)
This question uses many names in a nonstandard way (at least as far as the
coding theory literature is English is concerned), and this confuses the issues.
The codes in question are single-error-correcting codes but are not
Hamming codes in general. By Hamming code, one usually means a $[2^m-1,2^m-1-m]$
linear code with minimum distance $3$. It is a single-error-correcting
code, but not all single-error-correcting codes are Hamming codes. Some might
be shortened Hamming codes (set some data bits to $0$ and don't transmit
them at all) but even this is not strictly necessary.
So, what are these "rectangular" codes? They are punctured product codes,
where by a product code $\mathcal C_1\times \mathcal C_2$ is meant a
rectangular $k_2\times k_1$ array of data bits bordered by parity bits
such that each row is a codeword in $\mathcal C_1$ and each column is
a codeword in $\mathcal C_2$ where, of course, $\dim(\mathcal C_i) = k_i$.
The minimum distance of $\mathcal C_1\times \mathcal C_2$ is the product
of the minimum distances of $\mathcal C_1$ and $\mathcal C_1$.
If $\mathcal C_1$ and $\mathcal C_2$ are $[k_i+1,k_i]$ single parity check
codes, then the minimum distance of $\mathcal C_1\times \mathcal C_2$ is $4$.
But, the codes used by the OP are punctured product codes since the
lower right-hand corner
of the array (the parity of the parity bits!) has been deleted, and the
minimum distance reduces by $1$.
Codewords in a product code can be regarded as a single vector of
length $n_1n_2$ (just concatenate the rows into a single codeword)
and so $\mathcal C_1\times \mathcal C_2$ is a $[n_1n_2,k_1k_2, d_1d_2]$
linear code. The codes called Hamming codes by the OP's instructor
are thus $[n_1n_2-1,k_1k_2, 3]$ single-error-correcting codes, but
they are not Hamming codes. Note for example that with $k_1=k_2=2$,
the construction gives a $[8,4,3]$ code (punctured from a $[9,4,4]$
single-error-correcting, double-error-detecting code),
and this is not the same as
the $[7,4,3]$ Hamming code that all of us know and love.
Of course, error correction is easy with this code. If
one row parity-check sum fails as does one column parity-check sum,
the error is in the data bit in that row and column. If only a
row sum fails or only a column sum fails, the parity bit in the
failed sum was received incorrectly.
-------
Sheesh! OK, $$\begin{align}c_1&=u_1\\c_2&=u_2\\c_3&=u_1+u_2\\c_4&=u_3\\c_5&=u_4\\c_6&=u_3+u_4\\c_7&=u_1+u_3\\c_8&=u_2+u_4\end{align}$$ giving $$\left[\begin{matrix}c_1\\c_2\\c_3\\c_4\\c_5\\c_6\\c_7\\c_8\end{matrix}\right]=\left[\begin{matrix}1&0&0&0\\0&1&0&0\\1&1&0&0\\0&0&1&0\\0&0&0&1\\1&0&1&0\\0&1&0&1\end{matrix}\right]\left[\begin{matrix}u_1\\u_2\\u_2\\u_4\end{matrix}\right]$$ Can you figure out $\mathbf G$ now? It will not be of the form $[I\mid P]$.
Best Answer
It's sufficient to find a generating matrix.
A linear code is usually defined as a subspace of $F^n$ for some field $F$ (since you're talking about bits, you can take $F = \mathbb{F}_2 = \{0,1\}$). The code $C$ generated by a generating matrix $G$ is the span of the rows of $G$. The span of a set of vectors in $F^n$ is a subspace of $F^n$, so $C$ is a linear code.