Solved – Should I use aggregate or disaggregate data

categorical datadisaggregationregression

I have this dataset of flows:

Source entity | Dest entity | Traffic | Cost | Source location | Dest location | Direction | more independent variables (mostly nominal)

There is also a unit price ($/unit traffic) of each entity which comes from a discrete set {p1,p2,p3} and is ordinal/continuous. I want to model this unit price using regression analysis.

Now the question i'm facing is that price is assigned to an entity (which can be source or dest in the table above) and not to a flow which represents each row above. I'm assuming that flows are independent of each other (i.i.d).

Would it be wise to attribute unit price of an entity to a flow ?

I know there is the option of aggregating data on entities but i'm afraid that could be disadvantageous because:

  1. It is likely that information would be lost
  2. Dependency is introduced among rows

Also, which models could make sense ? I'm inclined towards regression because of simplicity.

Appreciate any help/references here. Thanks

PS. I'm already confused while dealing with this many nominal variables.

Best Answer

In addition to loss of statistical power, aggregation also adds potentially severe bias, particularly when there are non-linear associations between variables. As a rule, I vehemently oppose aggregation, unless there is a theoretical justification for doing so.

The non-linear association issue comes up all over the place in actual data in my experience.

Another example: aggregating individual data to census tracts is more or less meaningless because (1) inferences based on census-tract level statistics cannot be extended to non-census tracts without committing ecological, atomistic, or other cross-level fallacies, and (2) generally speaking, neither resident experiences, nor policy/planning responses are enacted at the level of census-tracts. As a contrasting example, aggregating to police precinct when drawing inferences makes much sense, since specific policing policies will vary from precinct to precinct.

Related Question