When I enter a nominal categorical variable as independent variable into the Cox regression procedure, SPSS gives cloaked results. It just reports something like the following:
P OR
Variable 0.000 --
Variable(1) 0.000 21.004
Variable(2) 0.005 5.074
Variable(3) 0.450 1.120
Variable(4) 0.000 15.620
Variable(5) 0.000 0.001
I don't know which variable level is the reference for the comparisons, and which variable level is being compared with which other variable level.
-
It seems that
Variable
is the reference, andVariable(#)
is being compared to it. Am I correct? Please let me know what is being compared with what, when the variable is nominal. Is SPSS incorrectly tries to calculate the mean for the nominal variable, and use it as a reference? Or does it compare each level to one of the levels manually entered by the user (eg, the last variable level, or a particular variable given)? -
Why
Variable
(which apparently is the reference variable level) has a P value despite not having Beta and OR? Is it an intercept? What is it? -
How to distinguish that what is exactly the
Variable(2)
orVariable(5)
? I need to know the exact names of the variable levels that were originally labeled or entered as strings. SPSS refuses to use the original variable levels' names. It does the same either when I am entering a numeric variable as categorical, or when I enter a string variable as the independent variable. And note that my numeric variable is properly labeled.
SPSS help and Google were not of much help in this regard.
The SPSS code is:
COXREG Time
/STATUS=Failed(1)
/PATTERN BY Variable
/CONTRAST (Variable)=Indicator
/METHOD=ENTER Variable
/PLOT SURVIVAL HAZARDS LML OMS
/PRINT=CI(95)
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
And this is the output:
B SE Wald df Sig. Exp(B)
Variable 52.892 5 .000
Variable(1) 2.053 .417 24.229 1 .000 7.795
Variable(2) .757 .379 3.986 1 .046 2.132
Variable(3) 2.547 .475 28.712 1 .000 12.771
Variable(4) .169 .373 .205 1 .650 1.184
Variable(5) -.456 .374 1.486 1 .223 .634
This table might be a key to understanding the above table:
Categorical Variable Codings a
Frequency (1) (2) (3) (4) (5)
Variable b 1=level1 10 1 0 0 0 0
2=level2 18 0 1 0 0 0
3=level3 8 0 0 1 0 0
4=level4 9 0 0 0 1 0
5=level5 11 0 0 0 0 1
6=level6 12 0 0 0 0 0
a Category variable: Variable
b Indicator Parameter Coding
Best Answer
The reason for this old-fashioned and confusing way of presenting information in SPSS is that when the 'covariate' is defined as categorical, SPSS ignores the variable and no matter what, starts to consider it a dummy variable as a grouping variable. The original variable levels entered by the user can be 1, 5, 12, 60. So SPSS basically disregards any labels and instead creates a new variable with K-1 level, and gives its own level names to the new variable. It would be more user-friendly if SPSS could use the level names of the original variable and tell the user that this or that variable level is the omitted one.
When plotting the variable levels, SPSS shows to be able to easily use the labels used by the user, as the plot legends. So if it can really understand the variable levels, why does it not use them while reporting its tables? Perhaps, lack of motive to improving the product.
It would also be more user-friendly if SPSS could mention something like my word in its help file! However, SPSS just gives a brief explanation on the contrasts, that are the way to determine vaguely the omitted variable and the reference variable.
This site is a good read.