Ok! If you remember generalized policy iteration, the idea is a policy $\pi$ is first evaluated $v_\pi(s)$ and then improved, into $\pi'$, by acting greedily with respect to our original policy evaluation: $\pi' \rightarrow greedy(v_\pi(s))$. The theorem basically states that acting $\varepsilon$-greedy $still$ will result in policy improvement i.e. a policy with greater reward $v_{\pi'}(s) \geq v_\pi(s)$. The theorem works this out in reverse.
Line 1 left-side: we evaluate our improved policy, $\pi'$, by an action-value function (instead of a value function) with respect to our original policy: $q_\pi(s,\pi'(s)).$
Line 1 right-side: $\sum_{a \in A}\pi'(a|s)q_\pi(s,a) \approx v_{\pi'}(s)$
Line 2: we expand $\pi'(a|s)$, which is the probability of taking an action in a given state, to conform to our definition of being $\varepsilon$-greedy: we choose the max action with $1-\varepsilon$ probability or a random action (which could include the max action) with $\varepsilon/m$ probability where $m$ is the total number of actions.
Line 3: This is by far the most fun part of the proof. $max_{a \in A}q_\pi(s,a)$ is replaced by $\sum_{a \in A}\frac{\pi(a|s)-\varepsilon/m}{1-\varepsilon}q_\pi(s,a).$ The claim is that the max action under $\pi'$ is greater than or equal to a weighted sum of actions under $\pi$ for any arbitrarily chosen max action. To see this, imagine there's two actions we can choose from $A = \{a_1,a_2\}$ where $a_2$ is arbitrarily selected as the max under $\pi$:
$\sum_{a \in A}\frac{\pi(a|s)-\varepsilon/2}{1-\varepsilon}q_\pi(s,a) = \frac{(\varepsilon/2 -\varepsilon/2)q_\pi(s,a_1) + ((1-\varepsilon) + \varepsilon/2 -\varepsilon/2)q_\pi(s,a_2)}{1-\varepsilon} = \frac{(1-\varepsilon)q_\pi(s,a_2)}{1-\varepsilon} = q_\pi(s,a_2).$
The equality is satisfied when the max action under $\pi'$ is the same as $\pi$, namely $a_2$: $ max_{a \in A}q_\pi(s,a) = q_\pi(s,a_2)$. The inequality is satisfied when the max action under $\pi'$ is not $a_2$ in which case: $max_{a \in A}q_\pi(s,a) > q_\pi(s,a_2)$.
Line 4: cancel terms and simplify.
What is crucial to understand is that generalized policy iteration has multiple iterations of evaluation and improvement. The policy under $\pi$ can be different from $\pi'$(hence the difference in selecting $a_2$ as max) because each were improved under different evaluations. This theorem is essentially saying, "Hey, greedily selecting actions after looking at my last choice of greedily selecting actions will either improve my overall reward or just keep it the same." One subtlety (for understanding line 2) is actually (not approximately) $\sum_{a \in A}\pi'(a|s)q_{\pi'}(s,a) = v_{\pi'}(s).$
Best Answer
By a non-greedy action, they mean to pick an action that is available for state $s$, $A(s)$, with equal probability.
Hence that is how we have $\frac1{|A(s)|}$.
It is possible that we pick an action which coincides with the greedy strategy.