Solved – Are nominal attributes strict classifications and equivalent to enumerations in programming languages

definitionmeasurementordinal-data

I've been looking random forest algorithms for text classification and referencing the Mahout random forest decision tree description. In it, there is a reference to two types of variables, nominal attributes, and real-valued attributes.

However, I'm not particularly sure what a nominal attribute is. Real-valued attributes seem to be just numeric attributes, things you can measure with a real numeric value.

Looking around, I've found the "Level of Measurement" entry on Wikipedia, specifically the section on "Nominal Scale". It states:

At the nominal scale, i.e., for a nominal category, one uses labels;
for example, rocks can be generally categorized as igneous,
sedimentary and metamorphic. For this scale, some valid operations are
equivalence and set membership. Nominal measures offer names or labels
for certain characteristics.

Variables assessed on a nominal scale are called categorical
variables; see also categorical data.

Later on, it states (and I think this is the most crucial part):

The central tendency of a nominal attribute is given by its mode;
neither the mean nor the median can be defined.

This sounds a great deal to me like enumerations in programming languages (or any key-value pair where the key is a distinct numeric value and the values are categories/labels); I would rarely perform mean, median, or any other arithmetical operations on the numeric values, but I'd certainly be able to determine the mode (how many things that classification is applied to a certain set).

Does this sound about right?

Best Answer

Yes, that's right. We usually talk about categorical variables as those having distinct categories as possible values, and split these into ordinal variables, where the possible categories can be ordered (like low, medium, high), and nominal variables, whose categories don't have an ordering (like plate, cup, spoon, fork).

Related Question