Lemma: Let $X,Y$ be Banach spaces. If $T_n:X\to Y$ are bounded operators with finite dimensional range and $T_n\to T\in B(X,Y)$ in operator norm, then $T$ is a compact operator.

For the case $X=Y=L^p$, $1< p\le\infty$, we will write $T$ as a norm limit of finite rank operators and deduce compactness of $T$ by the above lemma.

Let $n>1$ be an integer and partition the interval $[0,1]$ in $n$ subintervals of equal length, namely $I_{j,n}:=[\frac{j-1}{n},\frac{j}{n})$, $j=1,\dots,n$. Let $\phi_{j,n}$ denote the indicator function of $I_{j,n}$. Set
$$T_n:L^p[0,1]\to L^p[0,1], \;\;\;T_n(f)=\sum_{j=1}^n\bigg(\int_0^{\frac{j}{n}}f(t)dt\bigg)\cdot\phi_{j,n}$$
Note that the operator $T_n$ has automatically finite dimensional range, since $T_n(f)$ lies in the linear span of the functions $\{\phi_{j,n}:j=1,\dots,n\}\subset L^p[0,1]$. Also,

for $x\in[0,1]$ there exists a unique $j_x\in\{1,\dots,n\}$ such that $x\in I_{j_x,n}$ so $\phi_{i,n}(x)=1$ if and only if $i=j_x$. So

$$|T_n(f)(x)|^p=\bigg|\sum_{j=1}^n\bigg(\int_0^{\frac{j}{n}}f(t)dt\bigg)\cdot\phi_{j,n}(x)\bigg|^p=\bigg|\int_0^{\frac{j_x}{n}}f(t)dt\bigg|^p\le\bigg(\int_0^1|f(t)|dt\bigg)^p\le\|f\|_p^p$$

where in the last inequality we have used the fact that $\|f\|_1\le\|f\|_p$ for $f\in L^p[0,1]$. Thus
$$\|T_n(f)\|_p^p=\int_0^1|T_n(f)(x)|^pdx\le\int_0^1\|f\|_p^pdx=\|f\|_p^p$$
and thus $\|T_n\|\le1$, so $T_n$ are indeed bounded operators with finite rank.

We now show that $\|T_n-T\|_p\to0$. Indeed, we have
$$|T_n(f)(x)-T(f)(x)|^p=\bigg|\sum_{j=1}^n\int_0^{\frac{j}{n}}f(t)dt\cdot \phi_{j,n}(x)-\int_0^xf(t)dt\bigg|^p=$$ $$=\bigg|\int_0^\frac{j_x}{n}f(t)dt-\int_0^xf(t)dt\bigg|^p\;\;(\star)$$
where $j_x\in\{1,\dots,n\}$ is the unique integer such that $x\in I_{j_x,n}$ (and the above equality occurs because $\phi_{j_x,n}(x)=1$ and $\phi_{i,n}(x)=0$ for $i\ne j_x$). Continuing from $(\star)$, if we denote by $q$ the conjugate exponent $(1/p+1/q=1)$, we have
$$(\star)=\bigg|\int_x^{\frac{j_x}{n}}f(t)dt\bigg|^p\le\bigg|\int_0^1\chi_{I_{j_x,n}}(t)f(t)dt\bigg|^p\le$$ $$\le\bigg(\int_0^1\chi_{I_{j_x,n}}(t)\cdot|f(t)|dt\bigg)^p\le\bigg(\mu(I_{j_x,n})^{1/q}\cdot\|f\|_p\bigg)^p=\frac{1}{n^{p/q}}\cdot\|f\|_p^p$$
where we used Holder's inequality. Therefore,
$$\|T_n(f)-T(f)\|_p^p=\int_0^1|T_n(f)(x)-T(f)(x)|^pdx\le\int_0^1\frac{1}{n^{p/q}}\cdot\|f\|_p^pdt=\frac{1}{n^{p/q}}\cdot\|f\|_p^p$$
and thus $\|T_n-T\|_p\le\frac{1}{n^{1/q}}$. Letting $n\to\infty$ gives $T_n\to T$.

P.S:
Why it is reasonable to define the operators $T_n$ the way we did? First, we need them to have finite dimensional range. Second, we look at $T(f)(x)=\int_0^xf(t)dt$. This is a number very close to $\int_0^{j/n}f(t)dt$ for some suitable $j,n$. So it feels natural to partition the unit interval in small intervals of length $1/n$ and define $T_n(f)$ by the rule "take a $x$, determine in which small interval it lies (i.e. find the proper $j_x$), then assign the value $\int_0^{j_x/n}f(t)dx$. Implicitly we have been multiplying with $\phi_{j,n}$ and adding up, to make sure we evetually obtained the correct $j_x$. I hope this helps you understand the reasoning here.

First we show $T(f)$ is differentiable everywhere, and deduce an upper bound for its derivative in terms of $\|f\|$. If $x\not=0$, then by the product rule and fundamental theorem of calculus, $$\frac{dT(f)}{dx}=-\frac{2}{x^3}\int_0^x t^2f(t)dt+\frac{1}{x^2} x^2f(x)$$ and $$|\frac{dT(f)}{dx}|\le \frac{2}{x^3}\int_0^x t^2\|f\|dt + \|f\|=(\frac{2}{3}+1)\|f\|=O(1)\|f\|$$

We can also take care of the special case $x=0$ (with l'Hospital's rule), but this is unnecessary for the purpose of mean value theorem, which implies for any $0\le x_1<x_2\le 1$, we have for some $x_0\in (x_1, x_2)$, $$T(f)(x_2)-T(f)(x_1)=\frac{dT(f)}{dx}(x_0)(x_2-x_1)$$ $$\|T(f)(x_2)-T(f)(x_1)|\le O(1)\|f\|(x_2-x_1)$$

That is, for any family of functions $f$ with $\|f\|\le M$ uniformly, we must have $T(f)$ are uniformly equicontinuous, and also uniformly bounded bounded by $\|T\|M$, hence by the Arzelà–Ascoli theorem, it contains a convergent subsequence, which implies the family of $T(f)$ is pre-compact.

## Best Answer

By definition $A$ is compact if the image $A(B)$ is relatively compact for every bounded subset $B \subseteq C([a,b])$.

As you noticed correctly, you can use Arzelà-Ascoli to show this. Let $B \subseteq C([a,b])$ be bounded with $B \subseteq B_M(0)$. For pointwise boundedness observe that for every $x \in B, t\in [a,b]$ it is $$ |(Ax)(t)| = \left| \int_a^b F(s,t, x(s)) ds \right| \leq \int_a^b |F(s,t,x(s)| ds \leq (b-a) C $$ with $C = \max_{(s,t,x) \in [a,b]^2 \times [-M,M]} |F(s,t,x)|$. This maximum exists since $F$ is continuous.

Equicontinuity can be obtained as follows. Let $\epsilon > 0$. Then, as $F$ is uniformly continous on $[a,b]^2 \times [-M,M]$, there is some $\delta$ such that for all $t_1, t_2 \in [a,b]$ it is $$ |t_1 - t_2| < \delta \quad \Rightarrow \quad | F(s, t_1, x) - F(s, t_2, x)| < {\epsilon \over b-a} $$ for all $s \in [a,b], x \in [-M,M]$.

Hence, for $t_1, t_2$ as above one obtains \begin{align*} |(Ax)(t_1) - (Ax)(t_2)| &= \left| \int_a^b F(s,t_1, x(s)) - F(s, t_2, x(s)) ds \right| \\ &\leq \int_a^b | F(s,t_1, x(s)) - F(s, t_2, x(s))| ds \\ &\leq \int_a^b {\epsilon \over (b-a)} ds \leq \epsilon. \end{align*}