Solved – How to interpret the output of Cox regression with categorical nominal variables in SPSS

cox-modelregressionspss

When I enter a nominal categorical variable as independent variable into the Cox regression procedure, SPSS gives cloaked results. It just reports something like the following:

               P            OR
Variable       0.000        --
Variable(1)    0.000        21.004
Variable(2)    0.005        5.074
Variable(3)    0.450        1.120
Variable(4)    0.000        15.620
Variable(5)    0.000        0.001

I don't know which variable level is the reference for the comparisons, and which variable level is being compared with which other variable level.

  1. It seems that Variable is the reference, and Variable(#) is being compared to it. Am I correct? Please let me know what is being compared with what, when the variable is nominal. Is SPSS incorrectly tries to calculate the mean for the nominal variable, and use it as a reference? Or does it compare each level to one of the levels manually entered by the user (eg, the last variable level, or a particular variable given)?

  2. Why Variable (which apparently is the reference variable level) has a P value despite not having Beta and OR? Is it an intercept? What is it?

  3. How to distinguish that what is exactly the Variable(2) or Variable(5)? I need to know the exact names of the variable levels that were originally labeled or entered as strings. SPSS refuses to use the original variable levels' names. It does the same either when I am entering a numeric variable as categorical, or when I enter a string variable as the independent variable. And note that my numeric variable is properly labeled.

SPSS help and Google were not of much help in this regard.


The SPSS code is:

COXREG Time
  /STATUS=Failed(1)
  /PATTERN BY Variable
  /CONTRAST (Variable)=Indicator
  /METHOD=ENTER Variable 
  /PLOT SURVIVAL HAZARDS LML OMS
  /PRINT=CI(95)
  /CRITERIA=PIN(.05) POUT(.10) ITERATE(20).

And this is the output:

             B       SE     Wald    df   Sig.   Exp(B)

Variable                    52.892  5   .000    
Variable(1)  2.053  .417    24.229  1   .000    7.795
Variable(2)  .757   .379    3.986   1   .046    2.132
Variable(3)  2.547  .475    28.712  1   .000    12.771
Variable(4)  .169   .373    .205    1   .650    1.184
Variable(5) -.456   .374    1.486   1   .223    .634

This table might be a key to understanding the above table:

Categorical Variable Codings a                          
                       Frequency    (1) (2) (3) (4) (5)
Variable b  1=level1    10          1   0   0   0   0
            2=level2    18          0   1   0   0   0
            3=level3    8           0   0   1   0   0
            4=level4    9           0   0   0   1   0
            5=level5    11          0   0   0   0   1
            6=level6    12          0   0   0   0   0
a Category variable: Variable                           
b Indicator Parameter Coding        

Best Answer

The reason for this old-fashioned and confusing way of presenting information in SPSS is that when the 'covariate' is defined as categorical, SPSS ignores the variable and no matter what, starts to consider it a dummy variable as a grouping variable. The original variable levels entered by the user can be 1, 5, 12, 60. So SPSS basically disregards any labels and instead creates a new variable with K-1 level, and gives its own level names to the new variable. It would be more user-friendly if SPSS could use the level names of the original variable and tell the user that this or that variable level is the omitted one.

When plotting the variable levels, SPSS shows to be able to easily use the labels used by the user, as the plot legends. So if it can really understand the variable levels, why does it not use them while reporting its tables? Perhaps, lack of motive to improving the product.

It would also be more user-friendly if SPSS could mention something like my word in its help file! However, SPSS just gives a brief explanation on the contrasts, that are the way to determine vaguely the omitted variable and the reference variable.

This site is a good read.

Related Question