Dealing with degeneracy
Even if we don't use a simplex method approach, we're going to have to borrow an idea from lexicographic pivoting.
Specifically, let $H = \{h_1, h_2, \dots, h_m\}$ be any basis. We switch to the following problem, for some $\epsilon>0$: we maximize $cx$ subject to $Ax = b', x \ge 0$, where $$b' = b'(\epsilon) = b + \sum_{i=1}^m \epsilon^i A_{h_i}.$$ This choice of $\epsilon$ guarantees two things:
- The system $Ax = b', x \ge 0$ is still feasible. We simply take a solution $x$ to $Ax=b, x \ge 0$, and then replace $x_{h_i}$ by $x_{h_i}' = x_{h_i} + \epsilon^i$ for each $h_i \in H$.
- For all but finitely many $\epsilon>0$, $Ax=b'$ has no degenerate solutions: no solutions $x$ with fewer than $m$ nonzero coordinates. This is because if we take $m+1$ vectors of the form $b'(\epsilon)$ for any $\epsilon>0$, they will definitely span $\mathbb R^m$, so not all of them can be of the form $A_I x_I$ for a fixed $I$ with $|I|<m$. Therefore for each such $I$, at most $m$ values of $\epsilon$ are bad, and there's finitely many $I$.
- In fact, more generally, for all but finitely many $\epsilon>0$, $b'(\epsilon)$ is not of the form $A_I x_I$ for any $I$ such that $\operatorname{rank}(A_I) < m$. The argument is the same.
For the remainder of the proof, we'll consider $\epsilon>0$ small enough to avoid the finitely many bad $\epsilon$s.
Finding a basic optimal solution
Next, we argue that the LP still has an optimal solution. This is one part where my argument is sketchy, because I don't have a good proof of the trichotomy that each LP either has an optimal solution, or is infeasible, or is unbounded. (If the feasible region is bounded, then this is easy by compactness, but if it's, not, then we somehow need to use the linearity of the constraints. Even convexity is not enough!) Anyway, given that trichotomy, we know that our system is not infeasible, and weak duality is enough to know that it's not unbounded: for any dual feasible $y$, $y b'$ is an upper bound.
Let $x^*$ be our optimal solution; let $I = \{i : x_i^* \ne 0\}$ and $J = j : x_j^* = 0\}$. We can't have $|I|<m$ (for small enough $\epsilon>0$) because we ruled out degenerate solutions, but we might still have $|I|>m$, and we need to deal with that next.
In this case, we can solve $A u = 0, u_J = 0$ for a nonzero vector $u$. Now, consider replacing $x^*$ by $x^* + tu$ for some $t$. We are only changing the positive coordinates of $x^*$, so for $t$ in some neighborhood of $0$, this is still feasible. Moreover, all such points are still optimal: if we have $c(x^* + tu) < cx^*$ for $t$ of one sign, then we have $c(x^* + tu) > cx^*$ for $t$ of the opposite sign, which is impossible.
But taking $t = -\frac{x_i^*}{u_i}$ for some $i \in I$ such that $u_i \ne 0$ will set one more coordinate of $x^*$ to $0$, and at least one choice of $i$ will keep the other coordinates nonnegative. Repeating this process, we can reduce $|I|$, until we have $|I|=m$. We still have $\operatorname{rank}(A_I) = m$ by point 3 in the first section, so $A_I$ is invertible.
At this point, we have $x^* = A_I^{-1} b'$, which is (kind of) half of what we wanted.
Finding the dual solution
By strong duality, there is a dual solution $y$ such that $cx^* = yb'$. Strong duality is also sometimes proved using the simplex method, but there are other ways to do it: for example, using Farkas's lemma, which in turn can be shown using some separating hyperplane arguments. I'll skip this here for now, but I can elaborate if necessary.
By complementary slackness, we know that for every $i \in I$, since $x_i > 0$, we must have $yA_i = c_i$. This tells us how to solve for $y$: we have $yA_I = c_I$, so $y = c_I A_I^{-1}$.
Because $y$ is feasible, we have $yA_J \ge c_J$, which we can rewrite as $c_J - c_I A_I^{-1}A_J \le 0$. This is the second half of what we wanted.
Getting rid of $\epsilon$
Suppose that all $\epsilon \in (0, \frac1k]$ are sufficiently small for the arguments above to work. Then for every $\epsilon$ in the sequence $\frac1k, \frac1{k+1}, \frac1{k+2}, \dots$, we can find a basis $I$ such that $A_I^{-1}b'(\epsilon) \ge 0$ and $c_J - c_I A_I^{-1}A_J \le 0$.
By the pigeonhole principle, there is a basis $I$ that works for infinitely many $\epsilon$ in this sequence. Now take the limit of $A_I^{-1}b'(\epsilon)$ as $\epsilon \to 0$ along the sequence we found: since every $A_I^{-1}b'(\epsilon)$ is nonnegative, the limit $A_I^{-1}b'(0) = A_I^{-1}b$ is nonnegative as well.
We conclude that for this basis $I$, $A_I^{-1}b \ge 0$ and $c_J - c_I A_I^{-1}A_J \le 0$, so we have found a feasible basis.
Best Answer
If $b \ge 0$, the feasible region is nonempty (because $0$ is feasible); the feasible region is unbounded iff the linear programming problem
P: maximize $e^T x$ subject to $A x \le b$, $x \ge 0$
is unbounded (where $e$ is the vector of all 1's), and this is true iff the dual problem
D: minimize $b^T y$ subject to $A^T y \ge e$, $y \ge 0$
is infeasible. This in turn is equivalent to: there is no linear combination of the rows of $A$ with all coefficients $\ge 0$ and all entries $> 0$.