[Math] Mathematical notation for expressing the top n elements

logicnotation

I would like to know what is the mathematical notation to express the top n elements. Look at the equation below.
Here $x_w$ is a feature vector representing the contribution of a particular word $w$. $N_w$ is the count or frequency of the words. In the following equation, I am assigning 0 weights to (1). the top 50 words and (2). words with frequency < 10. I would like to know how to express the top 50 words using a better mathematical notation
$x_{w} = \begin{cases}
0 & \text{ if } N_w\in \text{top $50$ elements} \\
0 & \text{ if } N_w < 10 \\
N_w & \text{ otherwise }
\end{cases}$

Best Answer

This isn't possible. There could be overlap in your categories. For example, the top 50 elements could all have frequency less than 10. Because the first two categories have the same value of zero, perhaps you could combine them. Even then you could run into trouble if the data set is smaller than 50.

Instead of $N_w$, call the number of instances of the word $w$ by $N(w)$. Subscript the words numerically according to the rule $i < j$ if $N(w_i)$ > $N(w_j)$.

We have now fully ordered the set of (different) words. Let T = {$w_i$ | $i$ < 51 or if $N(w_i) < 10$}. You can now replace your top50elements with T and collapse the first two categories into one.

Related Question