Rather than thinking about the triangle itself, think about the following problem:
Say you have a square cake of unit side length. Uniformly at random, you pick a number $0 < x < 1$ and you slice the cake vertically into two rectangles with side lengths $x, 1$ and $1-x, 1$. Next, independently of the first cut and also uniformly at random, you pick a number $0 < y < 1$ and you slice the cake horizontally at $y$, so that now you have four rectangular pieces with sides $x, y$, $1-x, y$, $x, 1-y$, and $1-x, 1-y$.
If I ask you, what is the average area of the rectangle with side lengths $x, y$, it should be obvious that this is $(1/2)(1/2) = 1/4$. In fact, this is the average area of any one of the four pieces of cake that were created, since by an elementary symmetry argument, there is no difference between the probability distributions of the areas of the four pieces: You could, for instance, flip the cake horizontally or vertically. And since the total area of the cake is $1$, and cutting it twice leaves $4$ pieces, the average area should be $1/4$.
Another way to think about this is that if $X$ and $Y$ are the independent and identically distributed random variables that describe the locations of the two cuts along each side, then $XY$ is the random area of one of the four pieces, and $$\operatorname{E}[XY] \overset{\text{ind}}{=} \operatorname{E}[X]\operatorname{E}[Y] = (1/2)(1/2) = 1/4,$$ where the equality holds because $X$ and $Y$ are independent. Then since the average location of each cut is $1/2$, the result follows.
Now, what does this have to do with the triangle? Well, that triangle is just half the area of the aforementioned rectangle.
After seeing the edit, it is difficult to address the specific question without knowing what level of discussion and prerequisite theorems can be taken as true. You want something "in depth" and "conceptual" but, for instance, I do not know if you accept the formula for the expectation of a continuous random variable, say $$\operatorname{E}[X] = \int_{x \in \Omega} x f_X(x) \, dx,$$ or if you need that to be proven. If you need a conceptual explanation for why this holds true, then an analogy with a discrete-valued random variable is better motivation; e.g., we consider that the expected value of a discrete random variable is in a sense a weighted average of its outcomes, weighted by the likelihood of each such outcome: $$\operatorname{E}[X] = \sum_{x \in \Omega} x \Pr[X = x],$$ where $\Omega$ is the support of $X$. Replacing the sum by an integral and the probability mass by a probability density gives the previous formula and can be thought of as a limiting process similar to that of Riemann summation.
The bivariate continuous case is a natural extension of the above. We simply take the realization of an area $A(X,Y) = XY/2$ for each possible $(X,Y) \in [0,1]^2$, and weight it by the probability density of its occurrence. This leads to $$\operatorname{E}[A(X,Y)] = \int_{x=0}^1 \int_{y=0}^1 \frac{xy}{2} f_{X,Y}(x,y) \, dy \, dx = \frac{1}{2} \int_{x=0}^1 x \, dx \int_{y=0}^1 y \, dy,$$ since the joint density is uniform on the unit square; i.e. $f_{X,Y}(x,y) = 1$ for $0 \le x, y \le 1$.
That said, all of the above is something I would consider to be more conceptually sophisticated than the original question. However, there really is no simpler way to explain the meaning of the aforementioned integral. To ask for something even simpler would be like asking to explain what it means to integrate a function without talking about limits or sums.
The basic idea is to represent the process of drawing "colored balls with numbers" as an absorbing Markov chain. Then the fundamental matrix gives us the expected number of steps to reach an absorbing state:
$$ N = \sum_{k=0}^\infty Q^k = (I - Q)^{-1} $$
when the probability transition matrix $P$ is written in block form with all absorbing states grouped at the end:
$$ P = \begin{bmatrix} Q & R \\ 0 & I \end{bmatrix} $$
That is, the expected number of steps needed to reach an absorbing state starting from a specified transient (non-absorbing) state will be the entry corresponding to that state in the row vector $N \mathbf{1}$ where $\mathbf{1}$ is the vector of all ones.
States that are not absorbing are called transient states, and one can think of the similarity of the general formula above to the limit of a geometric series as result of moving among the transient states until an absorbing state is reached.
In any case using the case $m = 2$ colors and $n = 2$ numbers described in the Question will illustrate the computation. Any first step always give us one color and one number, which state we label $(1,1)$. The absorbing state, both colors and both numbers, we label $(2,2)$. The transition matrix thus involves three transient states and one absorbing state:
$$ P = \begin{bmatrix}
1/4 & 1/4 & 1/4 & 1/4 \\
0 & 1/2 & 0 & 1/2 \\
0 & 0 & 1/2 & 1/2 \\
0 & 0 & 0 & 1 \end{bmatrix} $$
$$ I - Q = \begin{bmatrix}
3/4 & -1/4 & -1/4 \\
0 & 1/2 & 0 \\
0 & 0 & 1/2 \end{bmatrix} $$
$$ (I - Q)^{-1} \mathbf 1 = [ 8/3 \;\; 2 \;\; 2 ]^T $$
After one step, giving one color and one number, the expected number of steps until hitting the absorbing state is $8/3$. In total we expect $1 + 8/3 = 11/3$ steps to reach the absorbing state with two colors and two numbers.
The uniform probabilities among colors as well as among numbers allows us to represent the transient states in a concise fashion, $(i,j)$ where $i$ is the count of distinct colors sampled so far and $j$ is the count of distinct numbers sampled. Immediately after the first sample is drawn we have $i,j = 1$, so only the states with $1 \le i \le m, 1 \le j \le n$ need to be represented. Of those $mn$ states, the state $(m,n)$ is absorbing and the other $mn-1$ states are transient. So $Q$ will be size $(mn-1)\times(mn-1)$, and the fact that sampled counts can never decrease means that $Q$ is upper triangular if we order the states naturally (lexicographically).
Evaluating $(I-Q)^{-1} \mathbf 1$ can therefore be done by backsolving the system $(I-Q) \mathbf v = \mathbf 1$. This is very efficient in that the matrix is not explicitly inverted and has complexity $O(m^2n^2)$ arithmetic operations.
Once that is done, adding one to the first entry of $\mathbf v$ (accounts for taking the first step) results in the expected number of steps to finish the "two dimensional coupon collector" task.
Best Answer
Hint: $A=E(X_1^2)$ and $B=E(X_1X_2)=E(X_1)E(X_2)=E^2(X_1)$