One way to prove this result is to first show it for functions on the manifold, and use this to further prove it for vector fields, co-vector fields and finally general tensor fields. The advantage of this approach as it requires the minimum of formulas.
We will need the following assumptions:
(a) The Lie derivative follows the Leibnitz rule when acting on a product of objects.
(b) $ \mathcal{L}_{X} (f) = X(f) $
Part 1 - Show for a function, f:
Using (b) we have
$$ \mathcal{L}_X \mathcal{L}_Y f = \mathcal{L}_X ( Y(f)) = X(Y(f)) $$
In a coordinate basis this is:
$$ X^{\rho} \partial_{\rho} (Y^{\sigma} \partial_{\sigma} f) = X^{\rho} ( \partial_{\rho} Y^{\sigma}) (\partial_{\sigma} f) + X^{\rho} Y^{\sigma} (\partial_{\rho} \partial_{\sigma} f) $$
Because $ \partial_{\rho} \partial_{\sigma} f $ is symmetric in $\rho \leftrightarrow \sigma $ then the second term on the RHS is symmetric in $ X \leftrightarrow Y $. Therefore:
$$ \mathcal{L}_X \mathcal{L}_Y f - \mathcal{L}_Y \mathcal{L}_X f = X^{\rho} ( \partial_{\rho} Y^{\sigma}) (\partial_{\sigma} f) - Y^{\rho} ( \partial_{\rho} X^{\sigma}) (\partial_{\sigma} f) = (X^{\rho} ( \partial_{\rho} Y^{\sigma}) - X^{\rho} ( \partial_{\rho} Y^{\sigma}))(\partial_{\sigma} f) $$
$$ = [X,Y]^{\sigma} \partial_{\sigma} f = [X,Y](f) = \mathcal{L}_{[X,Y]} f $$
Part 2: Show for a vector field, $ T = V \in TM $
Using the result from Part 1 we have:
$$ \mathcal{L}_{[X,Y]} (V(f)) = \mathcal{L}_{X} \mathcal{L}_{Y} (V(f)) - \mathcal{L}_{Y} \mathcal{L}_{X} (V(f)) $$
Using the Leibnitz rule:
$$ \mathcal{L}_{X} \mathcal{L}_{Y} (V(f)) = \mathcal{L}_{X} (( \mathcal{L}_{Y} V)f) + \mathcal{L}_{X}( V(\mathcal{L}_{Y} f) ) = (\mathcal{L}_{X} \mathcal{L}_{Y} V)(f) + V(\mathcal{L}_{X} \mathcal{L}_{Y}f) + (\mathcal{L}_{X} V)(\mathcal{L}_{Y} f) + (\mathcal{L}_{Y} V)(\mathcal{L}_{X} f) $$
The last 2 terms taken together are symmetric in $ X \leftrightarrow Y $. We therefore have that
$$ \mathcal{L}_{[X,Y]} (V(f)) = \mathcal{L}_{X} \mathcal{L}_{Y} (V(f)) - \mathcal{L}_{Y} \mathcal{L}_{X} (V(f)) = (\mathcal{L}_{X} \mathcal{L}_{Y} V - \mathcal{L}_{Y} \mathcal{L}_{X} V)(f) + V(\mathcal{L}_{X} \mathcal{L}_{Y} f - \mathcal{L}_{Y} \mathcal{L}_{X}f) $$
But we can also apply the Leibnitz rule to $\mathcal{L}_{[X,Y]} (V(f))$ to give:
$$ \mathcal{L}_{[X,Y]} (V(f)) = (\mathcal{L}_{[X,Y]} V)(f) + V(\mathcal{L}_{[X,Y]} f)) = (\mathcal{L}_{[X,Y]} V)(f) + V(\mathcal{L}_{X} \mathcal{L}_{Y} f - \mathcal{L}_{Y} \mathcal{L}_{X}f))$$
Comparing these 2 expressions we get that:
$$ (\mathcal{L}_{[X,Y]} V)(f) = (\mathcal{L}_{X} \mathcal{L}_{Y} V - \mathcal{L}_{Y} \mathcal{L}_{X} V)(f) $$
This is true for any function hence we have shown the identity for a vector field.
Part 3 - Show for a covector field, $ T = \eta \in T^{*}M $
We consider the action of $ \mathcal{L}_{[X,Y]} $ on the function $ \eta (V) $ where $ \eta $ is a covector field and V is a vector field. This works exactly the same as in part 2 giving us:
$$(\mathcal{L}_{[X,Y]} \eta)(V) = (\mathcal{L}_{X} \mathcal{L}_{Y} \eta - \mathcal{L}_{Y} \mathcal{L}_{X} \eta)(V) $$
This is true for any vector field hence the result is shown for covector fields.
Part 4 - General Tensor field, T
We will now consider the action of $ \mathcal{L}_{[X,Y]} $ on the function $T(X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s)$, where T is an (r,s) rank tensor, $ X_i $ is a vector field, and $ \eta_i $ is a covector field. This step proceeds similarly to before.
By the result from Part 1:
$$ \mathcal{L}_{[X,Y]}T(X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) = (\mathcal{L}_{X} \mathcal{L}_{Y} - \mathcal{L}_{Y} \mathcal{L}_{X})T(X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) $$
Similarly to Part 2 $ \mathcal{L}_{X} \mathcal{L}_{Y} $ acting on $ T(X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) $ gives terms symmetric in $ X \leftrightarrow Y $ when the Lie derivatives hit seperate terms. These terms will cancel with terms contained in $ - \mathcal{L}_{Y} \mathcal{L}_{X} T(X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) $. Therefore we are left with:
$$ (\mathcal{L}_{X} \mathcal{L}_{Y} - \mathcal{L}_{Y} \mathcal{L}_{X})T(X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) = (\mathcal{L}_{X} \mathcal{L}_{Y}T - \mathcal{L}_{Y} \mathcal{L}_{X}T)(X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) + T((\mathcal{L}_{X}\mathcal{L}_{Y} - \mathcal{L}_{Y} \mathcal{L}_{X})X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) + T(X_1,(\mathcal{L}_{X}\mathcal{L}_{Y} - \mathcal{L}_{Y} \mathcal{L}_{X})X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) + \dots $$
Using the results from Parts 2 and 3 gives:
$$ \mathcal{L}_{[X,Y]}T(X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) =(\mathcal{L}_{X} \mathcal{L}_{Y}T - \mathcal{L}_{Y} \mathcal{L}_{X}T)(X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) + T(\mathcal{L}_{[X,Y]}X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) + T(X_1,\mathcal{L}_{[X,Y]}X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) + \dots $$
Lastly, the use of the Leibnitz rule gives
$$ \mathcal{L}_{[X,Y]}T(X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) = \mathcal{L}_{[X,Y]}T(X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) + T(\mathcal{L}_{[X,Y]}X_1,X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) + T(X_1,\mathcal{L}_{[X,Y]}X_2,...,X_r,\eta_1,\eta_2,...,\eta_s) + \dots $$
Comparing the 2 equalities and noting that they are true for any vector fields $ X_1,X_2,...,X_r $ and any covector fields $ \eta_1,\eta_2,...,\eta_s $ proves that the desired result holds for any general tensor field T.
Best Answer
In Gauge fields, knots and gravity by J. Baez and J. P. Muniain authors got the following two pictures to visualize commutator:
I hope it helps you.