Being diagonal is not a property of an operator but of a matrix. That there exists an orthonormal basis $\beta$ in which $T$ is represented by a diagonal matrix doesn't imply that it is represented by a diagonal matrix in all orthonormal bases, so there's no contradiction here.
With respect to any orthonormal basis a self-adjoint operator is represented by a Hermitian (or self-adjoint) matrix, and the fact that there exists a basis $\beta$ in which it is represented by a diagonal matrix corresponds to the fact that every Hermitian matrix is diagonalizable by a unitary matrix.
I think our OP H_1317's proof is conceptually correct.
Here is a more somewhat more abstract proof:
Suppose $A$ is upper triangular; then I claim that $A^{-1}$ is also upper triangular; for we may write
$A = D + T, \tag 1$
where $D$ is diagonal and $T$ is strictly upper triangular; that is, the diagonal entries of $T$ are all zero; we observe that, since $A$ is triangular, $\det(A)$ is the product of the diagonal entries of $A$; since $A$ is unitary, it is non-singular and thus $\det(A) \ne 0$, so none of the diagonal entries of $A$ vanish, and the same applies to $D$; therefore $D$ is invertible and we may write
$A = D(I + D^{-1}T); \tag 2$
we next observe that $D^{-1}T$ is itself strictly upper triangular, hence nilpotent; in fact we have
$(D^{-1}T)^n = 0, \tag 3$
where $n = \text{size}(A)$; the nilpotence of $D^{-1}T$ allows us to write an explicit inverse for $I + D^{-1}T$; indeed, we have the well-known formula
$(I + D^{-1}T) \displaystyle \sum_0^{n - 1} (-D^{-1}T)^k = I + (-1)^n (D^{-1}T)^n = I; \tag 4$
thus,
$(I + D^{-1}T)^{-1} = \displaystyle \sum_0^{n - 1} (-D^{-1}T)^k; \tag 5$
since every matrix $(-D^{-1}T)^k$ occurring in this sum is upper triangular, we see that $(I + D^{-1}T)^{-1}$ is upper triangular as well; from (2),
$A^{-1} = (I + D^{-1}T)^{-1}D^{-1}, \tag 6$
which shows that $A^{-1}$ is upper triangular.
Having established my claim, we now invoke the unitarity of $A$:
$A^\dagger A = AA^\dagger = I, \tag 7$
i.e.,
$A^\dagger = A^{-1}; \tag 8$
we have by definition
$A^\dagger = (A^\ast)^T = ((D + T)^\ast)^T = (D^\ast + T^\ast)^T = (D^\ast)^T + (T^\ast)^T, \tag 9$
from which we see that $A^\dagger$ is lower triangular when $A$, and hence $A^{-1}$, is upper triangular;
then only way (8) can hold is with $T = 0$; therefore we see that
$A = D \tag{10}$
is a diagonal matrix.
Of course, if $A$ is lower triangular the same result binds, the proof almost identical to that given above. $OE\Delta$.
Best Answer
Hint: Take $x$ to be the standard basic vector $e_i$ and $y$ to be the standard basic vector $e_j$. So $\langle e_i, Ae_j\rangle=\langle Ae_i, e_j\rangle$. This should give you some info about the matrix $A$.