That the first condition implies the first is immediate, since (using your notation) you always have $m_i \le f(\xi_i) \le M_i$, so the sums in the second definition are caught between the $L$ und $U$ sums.
Edit in response to a comment an additional explanation is necessary here. For this direction
it suffices to show that $I^* = lim_{||P||\rightarrow 0} L(f,P)$ and $I_* = lim_{||P||\rightarrow 0} U(f,P)$ Since both parts are similar it suffices to show, e.g., the first equality.
First it is easy to see that for partitions $P\subset P^\prime$ we have $L(f,P)\le L(f,P^\prime)$. A remaining hurdle is that for two partitions we do not necessarily know that one is a subset of the other one. This is resolved by looking at common refinements:
Assume $P$ satisfies $L(f,P) > I^* - \varepsilon$ and $Q$ is an arbitrary partition. We need to show that then there is a refinement $Q^\prime$ of $Q$ such that $L(f,Q^\prime)\ge L(f,P)$ (and, consequently, $L(f,Q^\prime)>I^*-\varepsilon$).
For $Q^\prime$ one can choose the common refinement $R$: if $P=\{x_1,\ldots x_n \}$ and $Q=\{y_1,\ldots y_m \}$ then we just let $R = P\cup Q$. Since this is a refinement of both $P$ and $Q$ we have both $L(f,R)\ge L(f,P)$ as well as $L(f,R)\ge L(f,Q)$
Second edit: the original version was not correct:
For the other direction it suffices to show that if the function is integrable in the sense of the second definition then both $I_*$ and $I^*$ agree with the of the sums from the second definition. Since the reasoning is the same in both cases I'll just look at $I_*$.
So fix $\varepsilon >0$ and a given partition $P$ such that
$$|L - \sum_{i=1}^n f(\xi_i)\Delta x_i |< \varepsilon$$
if only the partition is fine enough.
Choose such a partition $P=\{x_0,\dots x_n\}$ and to $[x_{i-1},x_{i}]$ choose $\eta_i\in[x_{i-i},x_{i}]$ such that for
$m_i:=\inf \{ f(x):x\in [x_{i-1},x_i]\} $
we have $$0\le f(\eta_i)-m_i\le \frac{\varepsilon}{2n}$$
Then
\begin{eqnarray}
| L -\sum_{i=1}^n m_i \Delta x_i|
& = & |L- \sum_{i=1}^n f(\eta_i)\Delta x_i + \sum_{i=1}^n f(\eta_i)\Delta x_i
-\sum_{i=1}^n m_i\Delta x_i| \\
&\le & |L- \sum_{i=1}^n f(\eta_i)\Delta x_i| + \sum_{i=1}^n | f(\eta_i)
- m_i|\Delta x_i \\
& < & \frac{\varepsilon}{2} + \sum_{i=1}^n \frac{\varepsilon}{2n}=\varepsilon
\end{eqnarray}
If you 'see' that $0 <L -I_*< L -\sum_{i}m_i \Delta x_i$ then you are done here, otherwise it follows easily from the last estimate that the $\sum_i m_i \Delta x_i$ are, for any partition which is fine enough, $\varepsilon $ close to the fixed real number $L$, which of course implies that the $\sup$ over these sums exists and equals $L$ (here you need to use again the fact that you will approach the $\sup$, if it exists, if the width of the partitions goes to $0$).
There are several issues with the proposed definition.
When we partition the domain $D$ we usually require that the member of the partition are either not closed or allow them intersects by their boundaries of by subsets of zero measure. The reason for this is that a connected set cannot be a union of finitely many its pair-wise disjoint closed non-empty subsets. Moreover, by the Sierpiński theorem (see Appendix below), a continuum (that is, a connected compact space) is partitioned into countably many pair-wise disjoint closed subsets, then at most one member of the partition is non-empty.
The mesh $\|P\|$ of the partition defined as the measure of is largest member is bad, because then the Riemann sums fail to converge when $\|P\|$ tends to zero even for a continuous non-constant function defined on a square, because we can partition the square into thin strips with big oscillation of the function.
Thus, I think a usual measure of the mesh $\|P\|$ is the diameter of its largest member. For instance, such definitions of Riemann sums was proposed in the book [Fich], which I inherited from my mother. In it the integration domain $D$ was partitioned into finitely many parts by a family of curves in two-dimensional case (see Chapter 16, §1, 586) and of surfaces in three-dimensional case (see Chapter 18, §1, 643).
It is not very natural to define a Riemann integral based on Lebesgue measure. But if the members of partitions are so nice that they are Jordan measurable then they can be approximated (with respect to the measure) by bricks. In this case for a continuous function the limit of the Riemann sums exists and equals to the integral defined via the coverings by bricks (sub-hyperrectangles).
When the domain $D$ is not Jordan measurable then the Riemann integral fail to exists on it even for a non-zero constant function. This can happen even when $D$ is compact and connected. For instance, when $D$ is a cone over the fat Cantor set is not. Its inner Jordan measure vanishes, since its complement is dense; however, its outer Jordan measure does not vanish, since it cannot be less than its Lebesgue measure.
Appendix (from [Eng])
6.1.27. The Sierpiński theorem. If a continuum $X$ has a countable cover $\{X_i\}_{i=1}^\infty$ by pair-wise disjoint closed subsets, then at most one of the sets $X_i$ is non-empty.
Proof Let $X =\bigcup_{i=1}^\infty X_i$, where the sets $X_i$ are closed and $X_i\cap X_j =\varnothing$ whenever $i\ne j$;
assume that at least two of the sets $X_i$ are non-empty. From Lemma 6.1.26 it follows that there exists a decreasing sequence $C_1\supset C_2\supset\dots$ of continua contained in $X$ such that
$$C_i\cap X_i =\varnothing\mbox{ and }C_i\ne\varnothing\mbox{ for } i = 1, 2, \dots\tag{3}$$
The first part of (3) implies that $\left(\bigcap_{i=1}^\infty C_i\right)\cap\left(\bigcup_{i=1}^\infty X_i\right)=\varnothing$, i.e., that $\bigcap_{i=1}^\infty C_i=\varnothing$, and yet from the second part of (3) and compactness of $X$ it follows that
$\bigcap_{i=1}^\infty C_i\ne\varnothing$. $\square$
6.1.26. Lemma. If a continuum $X$ is covered by pair-wise disjoint closed sets $X_1, X_2,\dots$ of which at least two are non-empty, then for every $i$ there exists a continuum $C\subset X$ such that $C\cap X_i=\varnothing$ and at least two sets in the sequence $C\cap X_1, C\cap X_2,\dots$ are non-empty.
Proof. If $X_i =\varnothing$ we let $C = X$; thus we can assume that $X_i\ne\varnothing$. Take a $j\ne i$ such that $X_j\ne\varnothing$ and any disjoint open sets $U, V\subset X$ satisfying $X_i\subset U$ and $X_j\subset V$. Let $x$ be a point of $X_j$ and $C$ the component of $x$ in the subspace $\overline{V}$. Clearly, $C$ is a continuum, $C\cap X_i =\varnothing$ and $C\cap X_j\ne\varnothing$. Since $C\cap\operatorname{Fr}\overline{V}\ne\varnothing$, by virtue of the previous lemma, and since $X_j\subset \operatorname{Int}\overline{V}$, there exists a $k\ne j$ such that $C\cap X_k\ne\varnothing.$ $\square$
References
[Eng] Ryszard Engelking, General Topology, 2nd ed., Heldermann, Berlin, 1989.
[Fich] Grigorii Fichtenholz, Differential and Integral Calculus, v. III, 4-th edition, Moscow: Nauka, 1966, (in Russian).
Best Answer
You can generalize your "weaker" definition as follows.
Let $\mathcal{I}$ denote the set of closed intervals $[a,b]$ and let $\tau : \mathcal{I} \to \mathbb{R}$ be any function such that $\tau([a,b]) \in [a,b]$.
Examples are $\tau([a,b]) = a$, $\tau([a,b]) = b$, $\tau([a,b]) = \frac{a + b}{2}$ etc.
Define $S(f,P,\tau) = \sum_{i=1}^n f(\tau([x_{i-1},x_i])) (x_i-x_{i-1})$. Let us say that $f$ is $\tau$-integrable with $\tau$-integral $K$ if the obvious condition is satisfied.
As pointed out by Paramanand Singh, the following are equivalent for a bounded function $f$:
(1) $f$ is Darboux integrable
(2) $f$ is Riemann integrable
(3) $f$ is $\tau$-integrable for all $\tau$
(4) $f$ is $\tau$-integrable for some $\tau$
$\tau$-integrability seems to be conceptually simpler than Riemann integrability because it avoids to use tags. But it should be clear that there is a substantial arbitrariness in the choice of $\tau$. It can be defined by a simple rule as in your question, but it can also be "erratic". It may even depend on $f$ if you want. For example, if $f$ is continuous, then you can take $\tau([x_{i-1},x_i]))$ to be any point of $[x_{i-1},x_i]$ at which $f \mid_{[x_{i-1},x_i]}$ attains its minimum (or maximum).
So what might be the benefit of general tags? Here are some arguments.
a) If you know that $f$ is integrable (for example if $f$ is continuous or monotonic), then you can choose suitable tags $T$ which make $S(f,P,T)$ explicitly evaluable. For example, you can prove that $\int_0^t \frac{1}{1 + x^2}dx = \arctan t$ by choosing a tag as $\xi_i \in [x_{i-1},x_i]$ such that $\frac{1}{1 + \xi_i^2} = \frac{1}{1 + x_{i-1}x_i}$. I shall not go into details and I do not claim that this is an elegant proof, but it shows that general tags can be useful.
b) If you have integrable functions $f,g$, you know that their product $fg$ is integrable. Then you may choose any two $\xi_i, \xi'_i \in [x_{i-1},x_i]$ and consider the sums
$$\Sigma_{i=1}^n f(\xi_i)g(\xi'_i)(x_i - x_{i-1}) .$$
These converge to $\int_a^b f(x)g(x)dx$ as was shown by G.A. Bliss.