The idea is to find a way to change the data that does not alter $T$ but does alter the likelihood $P$ in an important way: namely, by an amount that truly depends on the parameter $\theta$. That would show that $T$ does not tell us all we need to know about $\theta$, insofar as it might be revealed through $P$. One elementary (yet fully rigorous) approach is sketched below.
Because the PDF of the underlying probability law is
$$f_\theta(x) = \frac{1}{\theta}\exp\left(x/\theta\right),$$
the assumption that $X_1$ and $X_2$ are independent implies
$$P(\mathbf{x}) = f_\theta(x_1)f_\theta(x_2) = \frac{1}{\theta^2}\exp((x_1+x_2)/\theta).$$
Suppose (in order to derive a contradiction) that this likelihood can be factored as
$$P(\mathbf{x}) = h(\mathbf{x})g(\theta, T(\mathbf{x})) = h(x_1,x_2)g(\theta, x_1+2x_2).$$
Taking logarithms will simplify things:
$$-2\log\theta + \frac{x_1+x_2}{\theta}=\log P(\mathbf{x}) = \log h(x_1,x_2) + \log g(\theta, x_1+2x_2).\tag{1}$$
Since both $x_1$ and $x_2$ are almost surely positive, for sufficiently small but nonzero $\epsilon \lt x_2$ both $x_1 + 2\epsilon$ and $x_2-\epsilon$ will still be positive, so it makes sense to plug them into both sides. Notice how this combination of changes in the $x_i$ was chosen to leave $g$ unchanged, because $$T(x_1,x_2) = T(x_1+2\epsilon, x_2-\epsilon).$$
At this juncture, compute the change in the right hand side of $(1)$ and the change in the left hand side when $(x_1,x_2)$ is changed to $(x_1+2\epsilon, x_2-\epsilon)$. (This requires only simple algebra.) Observe that the change in the RHS depends only on $x_1, x_2,$ and $\epsilon$ (by construction), but that the change in the LHS depends ineluctably on $\theta$ (because $\epsilon$ is nonzero). Draw your conclusions from this contradiction.
By the Neyman factorization theorem it is quite clear the second option is the sufficient statistic.
Just review the theorem,
Let $X_1,X_2,...X_n $ denote a random sample from a distribution that
has $\textit{pdf}$ or $\textit{pmf}$ of $f(x;\theta), \theta \in \Omega$. The statistic
$Y_1 = u_1(X_1 , ... ,X_n)$ is a sufficient statistic for $\theta$ if
and only if we can find two nonnegative functions, $k_1$ and $k_2$ such
that $$f(x_1, x_2, ... , x_n;\theta) = k_1[ u_1(x_1, x_2, ... ,
x_n);\theta ] k_2(x_1, x_2, ... , x_n)$$
where:
$k_1$ is a function that depends on the data $x_1, x_2, ..., x_n$ only
through the function $u_1(x_1, x_2,..., x_n)$, and the function
$k_2(x_1, x_2, ..., x_n)$ does not depend on the parameter $\theta$
Now look at your $L(\theta)$, the first part is $(\frac{2}{\theta})^n e^\frac{-\sum{y_i^2}}{\theta}$ this is the $k_1$ function, it depends on $ y_1, y_2,...,y_n $ only through $-\sum y_i^2$ (note, you should treat $\theta$ as a constant here, since you condition on it here). The second part is $\prod y_i$ which does not depend on $\theta$, it is the $k_2$ function.
So by the factorization theorem, you can directly say $-\sum y_i^2$ is the sufficient statistic for $\theta$.
Best Answer
$\bar X$ is not a sufficient statistic because it does not contain all the information about $(\mu,\sigma^2)$, which is what it would mean for it to be sufficient.
However, $\bar X$ does contain all the information about $\mu$ in the sample, whether or not $\sigma^2$ is known. For example, $\bar X$ attains the Cramèr-Rao bound. Similarly, if $\mu$ is not known, $s^2$ contains all the information about $\sigma^2$ (though not if $\mu$ is known since $(\mu-\bar X)^2$ has information about $\sigma^2$). Having all the information about parts of a parameter is a more complicated property than sufficiency, though it has been studied (see e.g., Sprott 1975).