Solved – box-cox transformation altered the anova result

anovadata transformation

Here is a summary of my actual data

> summary(data_licor_sub)
              Date    Block     Tree.ID                  Cross       Rank         Log.ID     
 Fri Aug 29 2014:24   1:96   17578  : 32   180-1 x MAX1     :56   short:160   Min.   : 1.00  
 Thr Aug 28 2014:80   2:94   17786  : 32   288-5 x MAX3     :46   tall :150   1st Qu.:18.00  
 Tue Aug 26 2014:80   3:96   17844  : 32   D125 x K1        :64               Median :36.00  
 Tue Sep  2 2014:40   4:24   17544  : 24   SO443WL x SO479WL:48               Mean   :38.35  
 Wed Aug 27 2014:86          17757  : 24   SO450WL x SO465WL:48               3rd Qu.:57.75  
                             17961  : 24   SO459WL x SO469WL:48               Max.   :86.00  
                             (Other):142                                                     
      HHMMSS        FTime             EBal.       Photo             Cond         Temp.group
 10:10:12:  1   Min.   :  275.5   Min.   :0   Min.   :-9.905   Min.   :0.07728   30:154    
 10:10:15:  1   1st Qu.: 6102.2   1st Qu.:0   1st Qu.:-5.320   1st Qu.:0.17907   35:156    
 10:12:51:  1   Median : 9855.5   Median :0   Median :11.233   Median :0.23381             
 10:12:54:  1   Mean   : 9529.0   Mean   :0   Mean   : 7.545   Mean   :0.26349             
 10:26:29:  1   3rd Qu.:13077.2   3rd Qu.:0   3rd Qu.:19.671   3rd Qu.:0.31512             
 10:26:32:  1   Max.   :18499.0   Max.   :0   Max.   :28.378   Max.   :1.01373             
 (Other) :304                                                                              
      Tair           Tleaf       Light.Dark     Cond.log       
 Min.   :26.40   Min.   :28.18   Dark :154   Min.   :-3.69372  
 1st Qu.:29.54   1st Qu.:29.46   Light:156   1st Qu.:-2.48140  
 Median :35.28   Median :31.96               Median :-2.09658  
 Mean   :34.64   Mean   :32.14               Mean   :-2.07202  
 3rd Qu.:39.23   3rd Qu.:34.90               3rd Qu.:-1.66605  
 Max.   :41.57   Max.   :36.11               Max.   : 0.01967  

Basically i am interested to see how the photosynthesis & conductance (respiration) varies for genotype (Tree.ID), family (Cross), height (Rank) and temperature (Temp.group).

Before doing any analysis, i tried to check for normality of my two variable (Photo/Cond), i found that they are not normal (i am only showing "Cond" here) and because of that i have decided to transform that data using Box-cox transformation. Even though the transformation worked really well, it changed my anova result.

Here is my initial anova regression result

> r1 <- aov(Cond ~ Temp.group + Cross + Rank + Tree.ID, data_licor_sub)

> summary(r1)
             Df Sum Sq Mean Sq F value   Pr(>F)    
Temp.group    1 0.3971  0.3971  39.755 1.04e-09 ***
Cross         5 1.1798  0.2360  23.624  < 2e-16 ***
Rank          1 0.0082  0.0082   0.824    0.365    
Tree.ID       5 0.7318  0.1464  14.653 7.61e-13 ***
Residuals   297 2.9665  0.0100                     

However after transforming the box-cox transformation using an appropriate lambda, this is what i get

> r2 <- aov(Cond^-0.18181818 ~ Temp.group + Cross + Rank + Tree.ID, data_licor_sub)
> summary(r2)
             Df Sum Sq Mean Sq F value   Pr(>F)    
Temp.group    1 0.3106 0.31063  59.045 2.26e-13 ***
Cross         5 1.1012 0.22023  41.863  < 2e-16 ***
Rank          1 0.0324 0.03237   6.153   0.0137 *  
Tree.ID       5 0.4203 0.08406  15.978 6.05e-14 ***
Residuals   297 1.5625 0.00526                     

As you can see the variable "Rank" was non significant before but now it became significant after transformation. Am i missing some thing?

Here is a plot that shows before and after transformation to indicate the transformation worked great

Box-Cox transformation result

I also did log-transformation based on comment from one of the person and here is the result. Looks more or less same as box-cox transformation.

> r3 <- aov(Cond.log ~ Temp.group + Cross + Rank + Tree.ID, data_licor_sub)
> summary(r3)
             Df Sum Sq Mean Sq F value   Pr(>F)    
Temp.group    1  11.52  11.520  57.802 3.83e-13 ***
Cross         5  40.30   8.061  40.444  < 2e-16 ***
Rank          1   1.08   1.082   5.427   0.0205 *  
Tree.ID       5  16.34   3.269  16.401 2.71e-14 ***
Residuals   297  59.19   0.199                     

Log transformation

Best Answer

First, unrelated to the question you asked, I’m not sure why you are running an ANOVA on your linear regression model. Some clarification about your variables and your goals for running this analysis might be helpful to those trying to help with your issue.

Second, to me personally it is not surprising that you are getting different results after the transformation. Remember the purpose for doing the transformation in the first place. Both ANOVA and linear regression are parametric tests, and one of the several assumptions that must be met for parametric tests is that the data distribution must be normal. The goal of transformation is to accomplish this task of normalizing the distribution. Sometimes normalization will kick out significant results, sometimes it will not change the results, and in your case, sometimes it will bring results into significance. I have had all of these happen in my own transformations. Field (Discovering Statistics Using R, 4th, 2012, p. 193), has a good review of the purpose and debate about transforming data. He points out that especially in the case of ANOVA, transforming data can be problematic, and recommends carefully understanding why and how you are transforming the variables, in addition to exploring robust methods (see below) instead of transforming the data in some circumstances. I highly recommend his very brief review of this issue, in which he refers to several articles on this topic, such as Games, 1984, Psychological Bulletin 95(2): 345-47, "Data transformations, power, and skew A rebuttal to Levine and Dunlap."

Third, if you are actually using different groups (a presumption for ANOVA), then you should really be making sure that each group is normally distributed not just the entirety of the dataset. You may be doing this already—but if so, the information you provide doesn’t make that clear. (Field p. 412: “In terms of normality, what matters is that distribution within groups are normally distributed”). This of course assumes that you are also satisfying all of the other across-data, and within-group tests, such as homogeneity of variance, etc. which were not shown above.

Fourth, you might consider robust tests (Field, p. 441+) instead of transformations. For example, the R package “robust” or command "rlm".

Related Question