Solved – How to apply Box Cox to train and test data

data transformationnormalizationpredictionskewnessstandardization

I am trying to standardize my data to performing prediction on it.

Some of the features in my data are skewed and hence I am applying Box Cox transformation to reduce skewness.

My data also contains negative values as well as zeros and as Box Cox transformation does not work on negative values, I shift my data set to make all values positive.

Using : F[i] = F[i] + 1 - min(F)
, where F is one of my feature

Please not that my train and test data sets are different, and both have different means.

I need to apply the same transformation to train as well as test data sets

How do I apply it to train and test data set ?

1) Should I apply Box Cox to train data set, capture the parameters, like, shifting constant (the constant used to shift train data set), lambda and use the same parameters for test data set ?

OR

2) Should I apply Box Cox to train and test data set independently ?
Not considering the train data set parameters while applying Box Cox on test data set ?

Best Answer

The data in the training and test sets should have the same meaning. If you standardize the data based in the mean and standard deviation or using a Box-Cox transformation, you should use in the testing set the means/SDs or lambda calculated in the training set.

Related Question