As pointed out by Snowball, the problem is inherently hard, see also this paper.
However, it can be done much faster in general than generating all the codewords.
I explain the idea for linear $[n,k]$ codes $C$ with $n = 2k$.
First, we construct two generator matrices $G_1 = (I\ A)$ and $G_2 = (B\ I)$ of $C$ where $I$ is the $(k\times k)$ unit matrix and $A,B$ are two $(k\times k)$ matrices. This is only possible if the positions $1,\ldots,k$ and $k+1,\ldots n$ form two information sets of $C$. If this is not the case, we have to find a suitable coordinate permutation first.
Now a codeword of weight $w$ has weight at most $\lfloor w/2\rfloor$ either in the first or in the second $n$ coordinates.
Thus, we can find all codewords of weight $\leq w$ among the codewords $x G_1 = (x, xA)$ and $x G_2 = (Bx, x)$, where $x$ is a vector of length $k$ and Hamming weight at most $\lfloor w/2\rfloor$.
So we can find the minimum distance $d$ of $C$ if we do this iteratively for all $w = 1,2,3,\ldots, d$.
In this way, you have to generate only a small fraction of all the codewords to find the minimum distance, and the idea can be generalized to any linear code. The first step then is to find a covering of the coordinates with information sets. However, the algorithm is still exponential, of course.
The webpage codetables.de is a valuable source for bounds on the best known minimum distances for fixed $q$, $n$ and $k$.
The proof seems wrong. It should be "Then, the columns $i_1,...,i_u$ of the parity check matrix must be linearly dependent." Also, one should put apart the null codeword. Corrected:
Let $v \in C$ ,$v\ne 0$, and $i_1, ..., i_u$ be the non-zero components of $v$. Then, the columns $i_1, ..., i_u$ of the parity check matrix must be linear dependent. When we choose a vector of weight $d$ for $v$, we receive $d$ linear dependent columns of $H$. On the other hand, the condition $u \ge d$ must hold if you can always pick $d-1$ linear independent columns.
I agree that it can be confusing. I would rather prove it thus:
First assume $d(C)=d$. Then, there is some codeword $c_0$ with weight $d$. Then, because $c_0 H^t =0$, this implies that $d$ columns of $H$ (those corresponding to the non-zero elements of $c_0$) are LD (linearly dependent). Furthermore, there cannot be a set of $m$ LD columns, with $0<m<d$; because in that case we could write $ x H^t=0$ for some vector $x$ with $m<d$ non-zero elements - but the vectors that verify that equation are the codewords, and there is no (non null) codeword with weight less than $d$.
The other direction goes similarly. If there is some set of $d$ LD columns , but no such set of $0<m<d$ columns, the (non null) solutions of $x H^t$ have minium weight $d$ - but these are the codewords, hence the distance is $d$.
Best Answer
Your observation in (1) looks correct. The empty set is never linearly dependent, and a set with one vector can only be linearly dependent if that vector is zero.
For (2), while a $0 \times 1$ matrix has zero rows, it does have one column. In this case, the one column is the zero vector (of the zero-dimensional vector space), and so your $H$ does have a zero column.