OK, let me do the reformulation. Let $f$ be a function defined for $x\ge 0$ such that $f(x) >0$, and define $c(\theta)^{-1} = \int_0^{\theta} f(x) \; dx$. Then we can define a probability density, parameterized by $\theta$, by $p̣_{\theta}(x) = c(\theta) f(x) I(0\le x \le \theta)$ where $I(x)$ denotes the indicator function of its argument.
Suppose $x_1, \dots, x_n$ is an iid sample from this density. Then the density of the sample can be written
\begin{equation}
p_{\theta}(x_1, \dots, x_n) = c(\theta)^n \prod_{i=1}^n f(x_i) \prod_{i=1}^n I(0\le x_i \le \theta)
\end{equation}
The last factor above can be seen to be $\begin{cases} =0 \text{ if } x_{(n)}>\theta \\ =1 \text{ if } x_{(n)} \le \theta \end{cases}$
and then the result follows from the factorization theorem.
I think the best way to understand sufficiency is to consider familiar examples. Suppose we flip a (not necessarily fair) coin, where the probability of obtaining heads is some unknown parameter $p$. Then individual trials are IID Bernoulli(p) random variables, and we can think about the outcome of $n$ trials as being a vector $\boldsymbol X = (X_1, X_2, \ldots, X_n)$. Our intuition tells us that for a large number of trials, a "good" estimate of the parameter $p$ is the statistic $$\bar X = \frac{1}{n} \sum_{i=1}^n X_i.$$ Now think about a situation where I perform such an experiment. Could you estimate $p$ equally well if I inform you of $\bar X$, compared to $\boldsymbol X$? Sure. This is what sufficiency does for us: the statistic $T(\boldsymbol X) = \bar X$ is sufficient for $p$ because it preserves all the information we can get about $p$ from the original sample $\boldsymbol X$. (To prove this claim, however, needs more explanation.)
Here is a less trivial example. Suppose I have $n$ IID observations taken from a ${\rm Uniform}(0,\theta)$ distribution, where $\theta$ is the unknown parameter. What is a sufficient statistic for $\theta$? For instance, suppose I take $n = 5$ samples and I obtain $\boldsymbol X = (3, 1, 4, 5, 4)$. Your estimate for $\theta$ clearly must be at least $5$, since you were able to observe such a value. But that is the most knowledge you can extract from knowing the actual sample $\boldsymbol X$. The other observations convey no additional information about $\theta$ once you have observed $X_4 = 5$. So, we would intuitively expect that the statistic $$T(\boldsymbol X) = X_{(n)} = \max \boldsymbol X$$ is sufficient for $\theta$. Indeed, to prove this, we would write the joint density for $\boldsymbol X$ conditioned on $\theta$, and use the Factorization Theorem (but I will omit this in the interest of keeping the discussion informal).
Note that a sufficient statistic is not necessarily scalar-valued. For it may not be possible to achieve data reduction of the complete sample into a single scalar. This commonly arises when we want sufficiency for multiple parameters (which we can equivalently regard as a single vector-valued parameter). For example, a sufficient statistic for a Normal distribution with unknown mean $\mu$ and standard deviation $\sigma$ is $$\boldsymbol T(\boldsymbol X) = \left( \frac{1}{n} \sum_{i=1}^n X_i, \sqrt{\frac{1}{n-1} \sum_{i=1}^n (X_i - \bar X)^2} \right).$$ In fact, these are unbiased estimators of the mean and standard deviation. We can show that this is the maximum data reduction that can be achieved.
Note also that a sufficient statistic is not unique. In the coin toss example, if I give you $\bar X$, that will let you estimate $p$. But if I gave you $\sum_{i=1}^n X_i$, you can still estimate $p$. In fact, any one-to-one function $g$ of a sufficient statistic $T(\boldsymbol X)$ is also sufficient, since you can invert $g$ to recover $T$. So for the normal example with unknown mean and standard deviation, I could also have claimed that $\left( \sum_{i=1}^n X_i, \sum_{i=1}^n X_i^2 \right)$, i.e., the sum and sum of squared observations, are sufficient for $(\mu, \sigma)$. Indeed, the non-uniqueness of sufficiency is even more obvious, for $\boldsymbol T(\boldsymbol X) = \boldsymbol X$ is always sufficient for any parameter(s): the original sample always contains as much information as we can gather.
In summary, sufficiency is a desirable property of a statistic because it allows us to formally show that a statistic achieves some kind of data reduction. A sufficient statistic that achieves the maximum amount of data reduction is called a minimal sufficient statistic.
Best Answer
Following the comments of @whuber and @Kamster, I probably got a better understanding. When we say that a sufficient statistic contains all the information needed to compute any estimate of the parameter, what we actually mean is that it is enough to compute the maximum likelihood estimator (which is a function of all sufficient statistics).
Given that I am answering my own question, and so I am not 100% sure of the answer, I will not mark it as correct until I get some feedback. Please add any comment and down-vote if you think I am being wrong/imprecise/etc...
(Let me know if this is not compatible with SE etiquette, being this my first question I beg your clemency if I am violating any rule)