Normalization – How to Normalize a Data Set Using Z-Scores from a Related Data Set

normalizationratingz-score

I am trying to come up with a normalization for a set of data. My company puts our products through a series of testers (people) to score our products on key metrics. I am able to access all records from each tester and when I look at the averages, I find that some testers always have scores higher than the standard deviation and some testers always have scores below the standard deviation. Since we only have 3-4 testers on each product, this variance in testers could impact our decision on products.

I was able to calculate a Z-Score for each tester. I was thinking about using this to create some form of normalization factor that I could apply to their raw scores. I didn't think using a z-score transformation was the right approach as this process is using a z-score from a different data set. I want the normalization to be using the testers scores and applying that to the products overall score.

Is this the right approach? I'm not the best at statistics so maybe there is a totally different method that works better.

Thank you

Further info:
Testers will have already tested hundreds of products (depends on how long they've been on staff). Products are randomly assigned to testers, testers never test the same product twice. They will typically score on 8 keys areas on a scale of 1-5 per product. We take an average of these scores to come up with the tester's final score (average). The products are then then sent to other testers for their averages. The testers scores are all averaged together to get a final score.

I know sometimes averaging averages can be bad, but in this case, the scale of ratings (1-5) never changes, so I believe it works here.

Best Answer

I interpret your approach as fitting a simple model to existing data. Subsequently you apply the fitted model to predict parameters for new products, which is perfectly OK.

This is the model, in technical terms:

  • Each product has a hidden "true" quality measure $x$ (the z-score).
  • For a randomly chosen product from the (hypothetical) population, containing the known and all possible future products, we get a random quality measure $X$. This random variable has mean $\mathrm{E}X = 0$ and variance $\mathrm{Var} X=1$, but is not necessarily normal distributed.
  • When the product $i$ is tested by tester $j$ , it gets assigned and average score $Y_{ij}$ within $[0, 5]$. This average score depends on the tester - some tend to give higher scores, some tend to give very similar scores to almost all products. And then there are random fluctuations on top of it. Let $\mu_j$ be the (theoretical) mean score given by tester $j$, and $\sigma_j$ be a scale factor, and $E_{ij}$ the random fluctuations, then $$ Y_{ij} = \sigma_j x_i + \mu_j + E_{ij}.$$ Your proceed now by estimating $\mu_j$ and $\sigma_j$ for each tester from the data, and for a new product, you predict the score $x$ from observed $y_{ij}$. I don't know exactly how you are doing this, but probably you have noticed that you get different results depending on $j$ when you just invert the model and calculate $$ \hat{x}_{i(j)} = \frac{y_{ij} - \hat\mu_j}{\hat\sigma_j}. $$ As a quick fix for discrepancies, I suggest to to just take the average of these fitted values as your guess for the true z-score, $$ \hat{x_i} = \frac{1}{n_i}\sum_{j=1}^{n_i}\frac{y_{ij} - \hat\mu_j}{\hat\sigma_j}. $$ Here $j=1,\dots, n_i$ indexes the testers that scored product $i$.

Your problem goes under the name inverse estimation or inverse prediction. Only few threads here on cv deal with this topic, possibly because methods for inverse estimation are less commonly used (and taught) than ordinary regression problems. Typically it concernes normal distributed data with no replicates for $i$ and no experimental design as yours, also in this post with an answer by @kjetilbhalvorsen. You can find an introduction into the topic in Greenwell and Kabban (2014), "investr: An R Package for Inverse Estimation" (The R Journal Vol. 6/1)


Sorry, I accidentially posted a comment as answer, originally (which kind of forced me to write an answer :-). Here is the first version (needed to understand the first comments to this answer): "Welcome to cv, Citanaf! Can you give us a rough idea about the dimension of the data you have? How many products have testers already tested, typically?"
Related Question