It is not appropriate to work in the space $H^1_0(U)$ since nonzero boundary conditions are being considered. You will have to assume some regularity of the boundary of $U$.
One version of Green's theorem (see e.g. the appendix in Evans) is that $$- \int_U (\Delta u) v \, dx = \int_U Du \cdot Dv \, dx -\int_{\partial U} \frac{\partial u}{\partial \nu} v \, dS.$$
A weak solution to the problem at hand can be proposed by setting $-\Delta u = f$ in $U$ and $\dfrac{\partial u}{\partial \nu} = -u$ on $\partial U$ so that $$\int_U fv = \int_U Du \cdot Dv + \int_{\partial U} uv \, dS \quad \forall v \in H^1(U)$$
or a bit more precisely
$$\int_U fv = \int_U Du \cdot Dv + \int_{\partial U} \newcommand{\tr}{\mathrm{Tr}\ \! }( \tr u )(\tr v) \, dS \quad \forall v \in H^1(U)$$
where $\tr : H^1(U) \to L^2(\partial U)$ is the trace operator.
An appropriate bilinear form is thus given by $$B[u,v] = \int_U Du \cdot Dv + \int_{\partial U} \newcommand{\tr}{\mathrm{Tr}\ \! }( \tr u )(\tr v) \, dS, \quad u,v \in H^1(U).$$
$B$ is clearly bounded. As far as coercivity goes, it may be helpful to use the Rellich-Kondrachov theorem. I can follow up with a hint if you like.
It remains to show that there is a constant $\alpha > 0$ with the property that $\|u\|_{H^1}^2 \le \alpha B[u,u]$ for all $u \in H^1(U)$. This can be proven by contradition. Otherwise, for every $n \ge \mathbb N$ there would exist $u_n \in H^1(U)$ with the property that $\|u_n\|^2_{H^1} > n B[u_n,u_n]$. For each $n$ define $v_n = \dfrac{u_n}{\|u_n\|_{H^1}}$. Then $v_n \in H^1(U)$, $\|v_n\|_{H^1} = 1$, and $B[v_n,v_n] < \dfrac 1n$ and all $n$.
Here we can invoke Rellich-Kondrachov. Since the family $\{v_n\}$ is bounded in the $H^1$ norm, there is a subsequence $\{v_{n_k}\}$ that converges to a limit $v \in L^2(U)$. However, since $\|Dv_{n_k}\|_{L^2}^2 < \dfrac{1}{n_k}$ it is also true that $Dv_{n_k} \to 0$ in $L^2$. Thus for any $\phi \in C_0^\infty(U)$ you have $$\int_U v D \phi \, dx = \lim_{k \to \infty} \int_U v_{n_k} D \phi \, dx = - \lim_{k \to \infty} \int_U D v_{n_k} \phi \, dx = 0.$$ This means $v \in H^1(U)$ and $D v = 0$, from which you can conclude $v_{n_k} \to v$ in $H^1(U)$. Since $\|v_{n_k}\|_{H^1} = 1$ for all $k$ it follows that $\|v\|_{H^1} = 1$ as well.
Next, since $\|\tr v_{n_k}\|_{L^2(\partial U)}^2 < \dfrac{1}{n_k}$ and the trace operator is bounded there is a constant $C$ for which
$$ \|\tr v\|_{L^2(\partial \Omega)} \le \|\tr v - \tr v_{n_k}\|_{L^2(\partial \Omega)} + \|\tr v_{n_k}\|_{L^2(\partial \Omega)} < \frac{1}{n_k} + C \|v - v_{n_k}\|_{H^1(U)}.$$
Let $k \to \infty$ to find that $\tr v = 0$ in $L^2(\partial U)$.
Can you prove that if $v \in H^1(U)$, $Dv = 0$, and $\tr v = 0$, then $v = 0$? Once you have established that fact you arrive at a contradiction, since $v$ also satisfies $\|v\|_{H^1} = 1$. It follows that $B$ is in fact coercive.
Matrix has icomplete rank, so the system is solvable only if
$$
\operatorname{rank} M = \operatorname{rank} \begin{pmatrix}M \;\big|\; f\end{pmatrix}.
$$
Due to truncation error, the last may not hold (but would hold, if you're using a conservative approximation, I suppose).
You can use QR decomposition to deal with that. Suppose $M = QR$ where $Q$ is orthogonal and $R$ is upper triangular matrix. The rank of $R$ is equal to the rank of $M$ and that is one less than number of its rows. So
$$
R = \begin{pmatrix}
\star & \star & \dots & \star & \star\\
0 & \star & \dots & \star & \star\\
0 & 0 & \ddots & \star & \star\\
0 & 0 & \dots & \star & \star\\
0 & 0 & \dots & 0 & 0\\
\end{pmatrix} = \begin{pmatrix}R' & h\\0&0\end{pmatrix}
$$
Solving the system $Mx = f$ is equivalent to solving the system
$$
Rx=Q^\top f \equiv \begin{pmatrix}z'\\\alpha\end{pmatrix}, \alpha \in \mathbb R.
$$
For the system to have a solution the last element $\alpha$ of $Q^\top f$ should be zero. Even if it's not, let's fix it as zero. Let's also fix the last element of $x$ as zero, since it can be any number.
$$
x = \begin{pmatrix}x'\\0\end{pmatrix}
$$
Now we have a nonsingular triangular system
$$
R' x' = z'
$$
which can easily be solved.
Note, that for the scheme you're using matrix $M$ is sparse, so you need sparse version of $QR$ decomposition. Also, since $Q$ is not sparse for QR even if $M$ was, you need to perform QR for extended matrix $(M|f)$ to compute both $R = Q^\top M$ and $Q^\top f$ at the same time.
Appendix
I've implemented this method in Matlab and also the method of one equation elimination. For correct problems they give the same solution.
For testing purposes I used square domain $[-1,1]\times[-1,1]$ and the following problem
$$
\Delta u = 0\\
\frac{\partial u}{\partial n} = g(x, y), \quad (x,y) \in \partial G.
$$
This problem has well known solvability criterion, that is
$$
0 = \left(\int_G \Delta u dx dy = \int_{\partial G} \frac{\partial u}{\partial n} d\Gamma\right) = \int_{\partial G} g d\Gamma
$$
Due to numerical errors it never can be exactly achieved, so from exact mathematical point of view, the numerical problem is always ill-posed. So we have to deal with regularized problem, that is to solve a problem with
$$
\int_{\partial G} g d \Gamma \neq 0.
$$
That's where the methods differ. They eliminate the inconsistency in various ways.
Equation elimination
When we are using equation elimination, that is removing one governing equation and replacing it with
$$
u_{i,j} = 0
$$
for example, we state that all the inconsistency should eliminated via $(i,j)$ point. In fact, the real problem we are solving now is
$$
\Delta u = \delta(x - x_i, y - y_i) \int_{\partial G} g d\Gamma\\
\frac{\partial u}{\partial n} = g(x, y), \quad (x,y) \in \partial G.
$$
so we put a source or a sink at $x_i, y_i$ to resolve the inconsistency. For the test problem I want $u(x,y) = \frac{x^2}{2} - \frac{y^2}{2}$ to be the solution, so
$$
g(-1, y) = 1,\quad g(1, y) = 1\\
g(x, -1) = -1, \quad g(x, 1) = -1
$$
Do demonstrate the effect of inconsistency I would take $g(x, -1) = 0.5$.
Note the pole-like thing at the middle - that was the point with $u_{i,j} = 0$ equation. At every other point the function satisfies $\Delta u = 0$ and the boundary conditions.
QR method
Again, let
$$
Mu = f
$$
be decomposed as
$$
Mu \equiv QRP^\top u = f\\
RP^\top u = Q^\top f.
$$
I've added permutation matrix $P$ to allow column reordering of the matrix $M$ which improves sparse QR a lot.
Let again,
$$
R = \begin{pmatrix}
R' & h\\
0 & 0
\end{pmatrix}, \quad Q^\top f = \begin{pmatrix}
z'\\\alpha
\end{pmatrix}
$$
So we state that
$$
x = \begin{pmatrix}
R^{-1} z'\\
0
\end{pmatrix}
$$
is a regularized solution to
$$
Rx = z \Leftrightarrow Mx = f
$$
since $Mx = f$ simply does not have a solution. But what is the real problem we are solving now?
Recall that $M$ is singular, we also know its left zeroing vector. Let $\omega = \frac{1}{\|\omega\|}(1, 1, \dots, 1)^\top$. Then
$$
\omega^\top M
$$
is simply the sum of every row of $M$ and that is zero (actually, it is implementation-dependent, but the idea remains the same). So the last row of $R$ is nothing more than $\omega^\top M$, and $\alpha = \omega^\top f \sim \int_{\partial G} g d\Gamma$ (up to some constant multiplication).
If we plug the regularized solution $x$ to the system $Rx = z$ we observe a residual:
$$
R x \neq z,\quad R x = z + \begin{pmatrix}0\\-\alpha\end{pmatrix}
$$
and by multiplying by $Q$ we have
$$
Mx = f - \alpha \omega
$$
since $\omega$ is exactly the last column of $Q$. So now we are solving a problem where inconsistency is smoothed over the whole domaing $G$ and its border $\partial G$, roughly speaking
$$
\Delta u = -\frac{1}{|G|} \int_{\partial G} g d\Gamma
$$
This method applied to the same problem gives the following solution
It looks fine, but, actually, the equation is violated (slightly) at every node and boundary conditions are also violated (also slightly).
Finally, there's no much difference what method to use if you're solving a well-posed problem, like arising from
$$
\nabla \phi = \mathbf E \Rightarrow \Delta \phi = \operatorname{div} \mathbf E.
$$
The Matlab code I've used you can find here
Best Answer
So I was able to prove existence and uniqueness of the PDE. I decided to prove uniqueness and existence from a first variational point of view (e.g. Chapter 8 of Evans).
Given \begin{equation*} \tag{0.1} \begin{cases} -\Delta u = f \text{ on } U \\ \frac{\partial u}{\partial n} + \beta(u) = 0 \text{ in } \partial U \end{cases} \end{equation*} with $0 < a \leq \beta'(z) \leq b$, we notice this implies the antiderivative of $\beta$ is strictly convex, so we hope to find an energy associated with $(0.1)$ such that the energy is strictly convex to get the uniqueness.
To do this we observe that the energy \begin{equation*} E(u) := \int_{U} \frac{1}{2} |Du|^2 - fu \text{ }dx + \int_{\partial U} \int_{0}^{Tr(u)} \beta(t) \text{ }dt dH^{n-1} \end{equation*} (Tr is the trace operator, which can be defined since $\partial U$ is smooth and bounded) minimized over $H^1(U)$ has the following Euler-Lagrange Equation which can be obtained by taking the Frechet derivative with any smooth function $v \in C^{\infty}(\overline{U})$ \begin{equation*} \int_{U} Du \cdot Dv - fv \text{ } dx + \int_{\partial U} \beta(Tr(u)) Tr(v) \end{equation*} where the last term is justified by using $\beta \in C^1$ to apply the fundamental theorem of calculus. Now we know from joint convexity on $(u,Du)$ of the Lagrangian associated to $E(u)$, $u$ solves $(0.1)$ then its a minimizer of $E(u)$ and and in fact $u$ solves $(0.1)$ if and only if it minimizes $E(u)$ over $H^1(U)$. So uniqueness follows since the minimizer of $E(u)$ is unique from strict convexity.
To show existence it suffices to show a minimizer exists. To do this we want to exploit the weak topology of $H^1(U)$ by showing the minimizing sequence of $E(u)$, $\{u_k\}$ is bounded in the $H^1(U)$ norm.
This follows from the following lemma: Let $f \in H^1(U)$ then there exists a $C$ independent of $f$ such that \begin{equation*} ||f||_{H^1(U)} \leq C(||Tr(f)||_{L^2(\partial U)} + ||Df||_{L^2(U)}) \end{equation*} [which the proof for is very similar to the usual Poincare Inequality proof]. Then a routine inequality argument with Cauchy's Inequality $ab \leq \epsilon a^2 + \frac{b^2}{4 \epsilon}$ shows that the minimizing sequence is bounded. So we can extract a subsequence \begin{equation*} u_{n_k} \rightarrow u \text{ in } L^2(U) \end{equation*} \begin{equation*} Du_{n_k} \rightharpoonup Du \text{ in } L^2(U) \end{equation*}
Then as the Lagrangian of $E(u)$ is convex in $(Du)$ we see it is lower semicontinous with respect to weak convergence, so $u$ is in fact a minimizer, so there exists a min .