Multiple Regression – Fitting Individual-Level Model on Aggregate-Level Target Data

aggregationdisaggregationmultiple regressionuncertainty

The task is to build a regression model for individuals. I have all the independent variables for each individual, but the dependent variable only as an aggregates on group-level.

Lets say, I am trying to predict the score a student will achieve at some test. I have information about the student that can be used as predictor-variable (f.e. time spent studying). But the results of the test are only given as aggregated sums for each class.
I can link every student to a class, but I don't know individual test-results.

One potential way I can think of would be to aggregate the independent variables too and run the regression completely on the aggregated data.
But it's probably rarely the case that correlation on aggregated level and individual level are the same. So I don't know how to make any judgement about the validity of such an approach.

Is there any 'good'(or less bad) approach to this problem?

Best Answer

Okay, I found out that what I was looking for is called a hierarchical linear models (see wikipedia). Just dropping that here, in case someone else encounters a similar problem.

Best Answer

Related Solutions

Solved – Standard errors in weighted least squares on aggregated data

Solved – model for machine learning on non-aggregated data, where we have a target variable, but also a grouping variable