Ok, so I'm not totally sure whether this addresses all (some!) of your doubts, so let me know if it does not. As a disclaimer, what follows is just one out of several ways to view duality in optimisation (you can find others in the answers to this post).
To make things a bit concrete I'm going to assume that your primal problem is a minimisation problem with objective $f:\mathbb{R}^n\to\mathbb{R}$ and set of feasible points $X\subseteq\mathbb{R}^n$. Suppose we can find some function $\phi:\mathbb{R}^m\to\mathbb{R}$ and set $Y\subseteq\mathbb{R}^m$ that has the following property:
$\phi$ evaluated at any point $y\in Y$ gives a lower bound for $f$ over all of $X$. That is, for any $y\in Y$
$$f(x)\geq \phi(y)\quad\quad\forall x\in X.$$
Note that the above implies that for any $y\in Y$, $\phi(y)$ is also a lower bound for the optimum value of the primal (that is, $p^*:=\inf_{x\in X}f(x)$). Then an interesting question to ask is what is the best lower bound we can extract from $\phi$ and $Y$? What is $d^*:=\sup_{y\in Y}\phi(y)$?
We call answering this question "a dual problem to the original primal problem" and we say that "strong duality" holds for this primal-dual pair if $d^*=p^*$.
Strong duality is handy because (assuming we can solve the dual problem) it gives us a way to check how suboptimal a candidate optimal point $\tilde{x}$ (which we have computed in some way) of the primal is by simply computing the difference $f(\tilde{x})-d^*$ (otherwise, since we don't know $p^*$, it is hard to judge how good is our candidate optimal point $\tilde{x}$). Indeed, if $f(\tilde{x})=d^*$ we know that $\tilde{x}$ is optimal and we can stop looking for a better $x$.
In order for the above to be of use, we should aim to construct a dual problem that is simple to solve. This is where the Lagrangian dual problem comes in; it is a dual problem which is always convex irrespective of whether the primal is.
The objective function of the Lagrangian dual problem is
$$\phi(y):=\inf_{x\in X}L(x,y)$$
where $L$ is the so-called "Lagrangian function" and $y$ are the "Lagrangian multipliers".
To check whether strong duality holds for a given primal-dual pair there is a whole battery of sufficient conditions collectively referred to as constraint qualifications. The most basic of which applies to the case where the primal is convex and the dual is the Lagrangian dual is called Slater's condition. It asks that there exists at least one "strictly feasible" point for the primal (that is, that $X$ has non-empty relative interior). If I remember correctly, in this case strong duality holds if and only if $L(x,y)$ has a saddle point (which turns out to be a pair of optimal point $x$ and multipliers $y$), however I don't have an intuitive explanation why this is so.
With regards to how to actually find an optimal point $x^*$ what typically happens in practice is that people run an optimisation algorithm (often gradient decent or Newton's) on both (and separately) the primal and dual problems which hopefully generate some sequences $(x_n)$ and $(y_n)$ such that
$$f(x_1)\geq f(x_2)\geq f(x_3)\geq\dots,\quad\quad L(y_1)\leq L(y_2)\leq L(y_3)\leq\dots.$$
Every few iterations they stop and check whether the difference $f(x_n)-L(y_n)$ is smaller than some tolerance level and if so stop completely and say that $x_n$ optimal. They call the corresponding $y_n$ a certificate of optimality, because anyone can then verify that $x_n$ is (nearly) optimal by checking that the mentioned difference is small. This is the essence of the so called primal-dual solvers. Note that if the primal, said algorithms is guaranteed to generate $x_n$ such that $f(x_n)\to p^*$ (ditto with respect to the dual problem).
You can find a proof of the statement in the table that you quote in the book
Linear and Integer Programming: Theory and Practice, Second Edition
pages 141-145, proof of Theorem 4.5.
However the exact statement is that existance of a degenerate and (additionally) unique solution of the primal implies multiple solutions to the dual.
Using the notation of the the above cited book, the proof of this statement proceeds as follows:
Let $x^*$ be a primal optimal basic feasible solution with basis variables $x_B^*$ (from this and due to the duality theorems you already know that the dual has a finite optimal solution). Since it is degenerate then there exists $$x_{i_0}^*=0, \qquad \text{ with } i_0 \in B$$ But the variable $y_{j_0}^*$ (that corresponds to $x_{i_0}$) of the corresponding dual basic feasible solution $y^*$ is nonbasic and therefore $$y_{j_0}^*=0 \tag1$$ On the other the Strong Complementary Slackness Theorem
(Theorem 3.6 of the book) implies that (given that primal in standard
form has an optimal solution) there exists a pair of primal and dual
optimal solutions $x^*$ and $y^*$ such that for each pair of
complementary dual variables $(x_i^*,y_j^*)$ it holds that either
$x_i^*>0$ or $y_j^*>0$ (or both).
Thus, due to uniqueness of the primal optimal solution (so this assumption is necessary for this proof) there is a dual optimal solution (not necessarily basic feasible) for which $$y_{j_0}^*> 0 \tag{2}$$
Relations $(1)$ and $(2)$ together imply that there exist multiple optimal dual solutions.
Edit 1: According to this question and answer you can see that it is impossible that both the primal and the dual have simultaneously multiple solutions. Together with the above proof this answers your question.
Best Answer
Suppose that a duality gap does occur and that $x_P=x_D.$ The duality gap means that $$f(x_D) + \lambda_D(1-g(x_D)) > f(x_P),$$ so $x_P=x_D$ means $$\lambda_D(1-g(x_P)) > 0.$$ Thus $\lambda_D > 0$ and $g(x_P) < 1.$
Now suppose that $\epsilon > 0$ is small enough that $\lambda_D - \epsilon > 0,$ and suppose that $\hat{x}\in X$ maximizes $f(x) + (\lambda_D - \epsilon)(1 - g(x))$ over $X.$ From the optimality of the original dual solution $(\lambda_D, x_D),$ $$f(\hat{x}) + (\lambda_D - \epsilon)(1 - g(\hat{x})) \ge f(x_D) + \lambda_D(1-g(x_D))\\= f(x_P) + \lambda_D(1-g(x_P)).$$ On the other hand, optimality of $\hat{x}$ for $\lambda = \lambda_D-\epsilon$ means $$f(\hat{x}) + (\lambda_D - \epsilon)(1 - g(\hat{x})) \le f(x_P) + (\lambda_D - \epsilon)(1 - g(x_P)).$$ Put those together and we have $$\lambda_D(1-g(x_P)) \le (\lambda_D-\epsilon)(1-g(x_p)).$$ Since $1-g(x_P)>0,$ we can divide it out and get $$\lambda_D \le \lambda_D - \epsilon,$$ a contradiction.